暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Citus 报错(三)ERROR: complex joins are only supported when all distributed tables are co-located and joined on

原创 姚崇 2023-02-24
662

错误如下:

ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns

创建测试表如下:

create table customers (customers_id bigint,product_name varchar(100)); alter table customers add primary key (customers_id); create table orders (order_id bigint,customers_id bigint,order_detail varchar(100)); alter table orders add primary key(order_id,customers_id); create table products (product_id bigint,customers_id bigint,product_name varchar(100)); alter table products add primary key (product_id); select create_distributed_table('customers','customers_id'); select create_distributed_table('orders','order_id'); insert into customers select n,md5(random()::text) from generate_series(1,10000) n; insert into orders select n*3,n,md5(random()::text) from generate_series(1,200000) n; insert into products select 10+n*2,n,md5(random()::text) from generate_series(1,10000) n;

上面测试表的整体逻辑就是让join列,分布在不同的work节点中
然后执行下面SQL

with products_local as (select * from products) select c.* , o.* from customers c , orders o where o.customers_id = c.customers_id and exists (select 1 from products_local p where p.customers_id = c.customers_id); ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns

即便是改成参考表,也不能解决上述错误,所以上述的问题还是数据分布上的问题,两个表之间没有采用共同的分片键

lightdb@test=# SELECT create_reference_table('products'); NOTICE: Copying data from local table... NOTICE: copying the data has completed DETAIL: The local data in the table is no longer visible, but is still on disk. HINT: To remove the local data, run: SELECT truncate_local_data_after_distributing_table($$public.products$$) create_reference_table ------------------------ (1 row) lightdb@test=# with products_local as (select * from products) select c.* , o.* from customers c , orders o where o.customers_id = c.customers_id and exists (select 1 from products_local p where p.customers_id = c.customers_id); ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns lightdb@test=#

原因

当你执行一个包含复杂联接操作的查询时,如果你得到这个错误消息,它通常意味着你在Citus中执行的查询涉及多个表,这些表没有在同一个节点上分布,或者它们被连接的列不是它们的分布列。

在Citus中,查询中的每个表都必须与其他表在相同的节点上进行分布。如果你正在连接的表不在同一个节点上分布,或者你正在连接的列不是它们的分布列,那么你将无法执行这个复杂联接操作。

为了解决这个问题,你需要重新设计你的表分布策略,以确保连接操作所涉及的所有表都在同一个节点上分布,并且连接的列是它们的分布列。另外,你也可以考虑使用其他分布式数据库工具,例如Apache Spark或Apache Cassandra,它们可能更适合你的需求。

真实环境中遇到要用CTE等价改写的SQL

例子1

DELETE FROM JYDB.MF_CalmarRatio A USING JYDB.MF_CalmarRatio F JOIN JYDB.tmp_MF_CalmarRatio B ON F.InnerCode = B.InnerCode AND F.EndDate = B.EndDate AND F.IndexCode = B.IndexCode LEFT JOIN JYDB.MF_CalmarRatio_TMP C ON F.InnerCode = C.InnerCode AND F.EndDate = C.EndDate AND F.IndexCode = C.IndexCode WHERE A.id = F.id AND C.InnerCode IS NULL;

改为CTE

WITH cte AS ( SELECT F.id FROM JYDB.MF_CalmarRatio F JOIN JYDB.tmp_MF_CalmarRatio B ON F.InnerCode = B.InnerCode AND F.EndDate = B.EndDate AND F.IndexCode = B.IndexCode LEFT JOIN JYDB.MF_CalmarRatio_TMP C ON F.InnerCode = C.InnerCode AND F.EndDate = C.EndDate AND F.IndexCode = C.IndexCode WHERE C.InnerCode IS NULL ) DELETE FROM JYDB.MF_CalmarRatio A WHERE A.id IN ( SELECT id FROM cte);

例子2

INSERT INTO JYDB.tmp_MF_StkHoldingAna SELECT DISTINCT A.InnerCode, A.ReportDate FROM (SELECT InnerCode, TradingDay FROM JYDB.QT_DailyQuote WHERE JSID > 730650253645 AND JSID <= 730650253645) B JOIN MF_KeyStockPortfolio A ON A.StockInnerCode = B.InnerCode WHERE B.TradingDay <= A.ReportDate AND A.ReportDate < CASE WHEN (SELECT C.TradingDay FROM JYDB.QT_DailyQuote C WHERE C.InnerCode = B.InnerCode AND C.TradingDay > B.TradingDay ORDER BY C.TradingDay LIMIT 1 OFFSET 0) IS NULL THEN CAST('9999-12-31' AS DATETIME) ELSE (SELECT C.TradingDay FROM JYDB.QT_DailyQuote C WHERE C.InnerCode = B.InnerCode AND C.TradingDay > B.TradingDay ORDER BY C.TradingDay LIMIT 1 OFFSET 0) END AND NOT EXISTS( SELECT 1 FROM JYDB.tmp_MF_StkHoldingAna B WHERE B.InnerCode = A.InnerCode AND B.ReportDate = A.ReportDate); WITH cte_QT_DailyQuote AS ( SELECT InnerCode, TradingDay FROM JYDB.QT_DailyQuote WHERE JSID > 730650253645 AND JSID <= 730650253645 ), cte_MF_KeyStockPortfolio AS ( SELECT StockInnerCode AS InnerCode, ReportDate FROM MF_KeyStockPortfolio ), cte_Joined AS ( SELECT DISTINCT A.InnerCode, A.ReportDate FROM cte_MF_KeyStockPortfolio A JOIN cte_QT_DailyQuote B ON A.InnerCode = B.InnerCode WHERE B.TradingDay <= A.ReportDate AND A.ReportDate < COALESCE(( SELECT C.TradingDay FROM cte_QT_DailyQuote C WHERE C.InnerCode = B.InnerCode AND C.TradingDay > B.TradingDay ORDER BY C.TradingDay LIMIT 1 OFFSET 0), CAST('9999-12-31' AS DATETIME)) AND NOT EXISTS(SELECT 1 FROM JYDB.tmp_MF_StkHoldingAna D WHERE D.InnerCode = A.InnerCode AND D.ReportDate = A.ReportDate) ) INSERT INTO JYDB.tmp_MF_StkHoldingAna SELECT * FROM cte_Joined;

在这个重写版本中,我们使用了3个CTE:

  • cte_QT_DailyQuote: 从JYDB.QT_DailyQuote表中检索符合特定条件的InnerCode和TradingDay,这个CTE对应之前SQL中的子查询B。
  • cte_MF_KeyStockPortfolio: 从MF_KeyStockPortfolio表中检索StockInnerCode和ReportDate,并将StockInnerCode作为InnerCode返回,这个CTE对应之前SQL中的表MF_KeyStockPortfolio。
  • cte_Joined: 通过将上面两个CTE连接起来来计算查询结果,以及通过子查询找到下一个交易日的TradingDay来计算CASE语句中的值。这个CTE对应之前SQL中的SELECT语句。
    最后,在CTE的末尾,我们将查询结果插入到JYDB.tmp_MF_StkHoldingAna表中。
最后修改时间:2023-03-02 17:14:02
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论