暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL/Citus 常见报错问题分析(一)ERROR: direct joins between distributed and local tables are not supported

原创 姚崇 2023-03-02
304

错误提示如下:

INSERT INTO JYDB.tmp_MF_StkHoldingAna SELECT DISTINCT A.InnerCode, A.ReportDate FROM JYDB.MF_InvestIndustry A WHERE A.JSID > 732301709313 AND A.JSID <= 732301709313 AND NOT EXISTS(SELECT 1 FROM JYDB.tmp_MF_StkHoldingAna B WHERE B.InnerCode = A.InnerCode AND B.ReportDate = A.ReportDate); ERROR: direct joins between distributed and local tables are not supported HINT: Use CTE's or subqueries to select from local tables and use them in joins

其中tmp_MF_StkHoldingAna位cn上的本地表,将其转换为reference表,便可正常执行

lightdb@jydb=# select * from citus_tables ; table_name | canopy_table_type | distribution_column | colocation_id | table_size | shard_count | table_owner | access_method ---------------------------------+-------------------+---------------------+---------------+------------+-------------+-------------+--------------- jydb.mf_investindustry | distributed | innercode | 4 | 388 MB | 80 | jydb | heap jydb.tmp_mf_stkholdingana | reference | <none> | 8 | 72 kB | 1 | jydb | heap lightdb@jydb=# begin; BEGIN lightdb@jydb=*# INSERT INTO JYDB.tmp_MF_StkHoldingAna SELECT DISTINCT A.InnerCode, A.ReportDate lightdb@jydb-*# FROM JYDB.MF_InvestIndustry A lightdb@jydb-*# WHERE A.JSID > 732301709313 lightdb@jydb-*# AND A.JSID <= 732301709313 lightdb@jydb-*# AND NOT EXISTS(SELECT 1 FROM JYDB.tmp_MF_StkHoldingAna B lightdb@jydb(*# WHERE B.InnerCode = A.InnerCode AND B.ReportDate = A.ReportDate); INSERT 0 0

复现测试

创建测试表

drop table if exists products; drop table if exists customers; drop table if exists orders; create table customers (customers_id bigint,product_name varchar(100)); alter table customers add primary key (customers_id); create table orders (order_id bigint,customers_id bigint,order_detail varchar(100)); alter table orders add primary key(order_id,customers_id); create table products (product_id bigint,customers_id bigint,product_name varchar(100)); alter table products add primary key (product_id); select create_distributed_table('customers','customers_id'); select create_distributed_table('orders','customers_id',colocate_with=>'customers'); lightdb@test=# select * from citus_tables ; table_name | citus_tables_type | distribution_column | colocation_id | table_size | shard_count | table_owner | access_method ------------+-------------------+---------------------+---------------+------------+-------------+-------------+--------------- customers | distributed | customers_id | 7 | 32 kB | 4 | lightdb | heap orders | distributed | customers_id | 7 | 32 kB | 4 | lightdb | heap (2 rows)

如下是不会报错

lightdb@test=# select p.*,o.* from products p ,orders o where o.customers_id = p.customers_id; product_id | customers_id | product_name | order_id | customers_id | order_detail ------------+--------------+--------------+----------+--------------+-------------- (0 rows)

但是执行下面SQL便报如下错误

lightdb@test=# select c.* , o.* from customers c , orders o where o.customers_id = c.customers_id lightdb@test-# and exists (select 1 from products p where p.customers_id = c.customers_id); ERROR: direct joins between distributed and local tables are not supported HINT: Use CTE's or subqueries to select from local tables and use them in joins

关于这个错误的解释:

这个错误提示是因为 citus不支持在分布式表和本地表之间进行直接连接操作。
在 citus中,分布式表是由多个节点上的数据子集组成的,而本地表只存在于单个节点上。由于分布式表和本地表之间的数据分布不同,citus不能有效地执行这种连接操作。
相反,您可以使用公共表表达式 (CTE) 或子查询来从本地表中选择数据,然后将其与分布式表连接。例如:

WITH local_data AS ( SELECT * FROM my_local_table ) SELECT * FROM my_distributed_table

JOIN local_data ON my_distributed_table.key = local_data.key
上面的示例中,我们首先使用 CTE 选择本地表中的数据,然后将其与分布式表进行连接。这种方法可以避免直接连接分布式表和本地表,从而避免了错误提示。
总之,在 citus中连接分布式表和本地表时,您应该使用 CTE 或子查询来选择本地表数据,并将其与分布式表进行连接,而不是直接连接它们。这可以确保 citus在分布式集群中正确执行查询,并避免出现错误提示。
所以将原SQL改成

lightdb@test=# with products_local as (select * from products) lightdb@test-# select c.* , o.* lightdb@test-# from customers c , orders o lightdb@test-# where o.customers_id = c.customers_id lightdb@test-# and exists (select 1 from products_local p where p.customers_id = c.customers_id); customers_id | product_name | order_id | customers_id | order_detail --------------+--------------+----------+--------------+-------------- (0 rows)

或者考虑将products转换成分布式参考表

SELECT create_reference_table('products'); lightdb@test=# select c.* , o.* lightdb@test-# from customers c , orders o lightdb@test-# where o.customers_id = c.customers_id lightdb@test-# and exists (select 1 from products p where p.customers_id = c.customers_id); customers_id | product_name | order_id | customers_id | order_detail --------------+--------------+----------+--------------+-------------- (0 rows)

错误遍不会存在,正常情况下极少数会使用本地表

最后修改时间:2023-03-02 13:43:47
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论