错误提示如下:
ERROR: direct joins between distributed and local tables are not supported
HINT: Use CTE's or subqueries to select from local tables and use them in joins
创建测试表
drop table if exists products;
drop table if exists customers;
drop table if exists orders;
create table customers (customers_id bigint,product_name varchar(100));
alter table customers add primary key (customers_id);
create table orders (order_id bigint,customers_id bigint,order_detail varchar(100));
alter table orders add primary key(order_id,customers_id);
create table products (product_id bigint,customers_id bigint,product_name varchar(100));
alter table products add primary key (product_id);
select create_distributed_table('customers','customers_id');
select create_distributed_table('orders','customers_id',colocate_with=>'customers');
lightdb@test=# select * from canopy_tables ;
table_name | canopy_table_type | distribution_column | colocation_id | table_size | shard_count | table_owner | access_method
------------+-------------------+---------------------+---------------+------------+-------------+-------------+---------------
customers | distributed | customers_id | 7 | 32 kB | 4 | lightdb | heap
orders | distributed | customers_id | 7 | 32 kB | 4 | lightdb | heap
(2 rows)
如下是不会报错
lightdb@test=# select p.*,o.* from products p ,orders o where o.customers_id = p.customers_id;
product_id | customers_id | product_name | order_id | customers_id | order_detail
------------+--------------+--------------+----------+--------------+--------------
(0 rows)
但是执行下面SQL便报如下错误
lightdb@test=# select c.* , o.* from customers c , orders o where o.customers_id = c.customers_id
lightdb@test-# and exists (select 1 from products p where p.customers_id = c.customers_id);
ERROR: direct joins between distributed and local tables are not supported
HINT: Use CTE's or subqueries to select from local tables and use them in joins
关于这个错误的解释:
这个错误提示是因为 Canopy不支持在分布式表和本地表之间进行直接连接操作。
在 Canopy中,分布式表是由多个节点上的数据子集组成的,而本地表只存在于单个节点上。由于分布式表和本地表之间的数据分布不同,Canopy不能有效地执行这种连接操作。
相反,您可以使用公共表表达式 (CTE) 或子查询来从本地表中选择数据,然后将其与分布式表连接。例如:
WITH local_data AS (
SELECT *
FROM my_local_table
)
SELECT *
FROM my_distributed_table
JOIN local_data ON my_distributed_table.key = local_data.key
上面的示例中,我们首先使用 CTE 选择本地表中的数据,然后将其与分布式表进行连接。这种方法可以避免直接连接分布式表和本地表,从而避免了错误提示。
总之,在 Canopy中连接分布式表和本地表时,您应该使用 CTE 或子查询来选择本地表数据,并将其与分布式表进行连接,而不是直接连接它们。这可以确保 Canopy在分布式集群中正确执行查询,并避免出现错误提示。
所以将原SQL改成
lightdb@test=# with products_local as (select * from products)
lightdb@test-# select c.* , o.*
lightdb@test-# from customers c , orders o
lightdb@test-# where o.customers_id = c.customers_id
lightdb@test-# and exists (select 1 from products_local p where p.customers_id = c.customers_id);
customers_id | product_name | order_id | customers_id | order_detail
--------------+--------------+----------+--------------+--------------
(0 rows)
或者考虑将products转换成分布式参考表
SELECT create_reference_table('products');
lightdb@test=# select c.* , o.*
lightdb@test-# from customers c , orders o
lightdb@test-# where o.customers_id = c.customers_id
lightdb@test-# and exists (select 1 from products p where p.customers_id = c.customers_id);
customers_id | product_name | order_id | customers_id | order_detail
--------------+--------------+----------+--------------+--------------
(0 rows)
错误遍不会存在,正常情况下极少数会使用本地表
最后修改时间:2023-03-02 17:13:07
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。