暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

测试hash分区会不会因为键值列数据倾斜引起分区数据倾斜

原创 黄宸宁 2014-06-23
1611
今天在小组群中,有同事提到hash分区不会因为键值列数据倾斜,导致分区数据倾斜的事情,根据以往对hash分区的理解,hash分区是通过对键值列做hash运算后,按hash值来进行分区,这样的话,同样的键值通过hash运算后就应该会得到相同的hash值,如果键值列存在数据倾斜的话,照理说分区数据也会出现倾斜,为此通过以下实验来证明这点。
一、创建测试数据
HCN@HCN>
HCN@HCN>CREATE TABLE test_hash
2 (object_id NUMBER(20),object_name VARCHAR2(50))
3 PARTITION BY HASH(object_id)
4 (
5 PARTITION p1 tablespace HCN,
6 PARTITION p2 tablespace HCNTBS,
PARTITION p3 tablespace HCNTBS,
7 8 PARTITION p4 tablespace HCN,
9 PARTITION p5 tablespace HCN,
PARTITION p6 tablespace HCNTBS);
10
Table created.
HCN@HCN>
HCN@HCN>insert into test_hash select object_id,object_name from dba_objects;
49288 rows created.
HCN@HCN>
HCN@HCN>
HCN@HCN>commit;
Commit complete.
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p1);
COUNT(*)
----------
6146
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p2);
COUNT(*)
----------
6192
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p3);
COUNT(*)
----------
12533
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p4);
COUNT(*)
----------
12235
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p5);
COUNT(*)
----------
6123
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p6);
COUNT(*)
----------
6059
HCN@HCN>
HCN@HCN>

二、插入数据,键值列值保持一致
HCN@HCN>
HCN@HCN>
HCN@HCN>declare
2 var_sql varchar2(10000);
3 begin
4 for i in 1 .. 10000 loop
5 var_sql:= 'insert into test_hash values (1,''test'||i||''')';
6 execute immediate var_sql;
7 end loop;
8 commit;
9 end;
10 /
PL/SQL procedure successfully completed.
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>declare
2 var_sql varchar2(10000);
3 begin
4 for i in 1 .. 10000 loop
5 var_sql:= 'insert into test_hash values (1,''test'||i||''')';
6 execute immediate var_sql;
7 end loop;
8 commit;
9 end;
10 /
PL/SQL procedure successfully completed.
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>declare
2 var_sql varchar2(10000);
3 begin
4 for i in 1 .. 10000 loop
5 var_sql:= 'insert into test_hash values (1,''test'||i||''')';
6 execute immediate var_sql;
7 end loop;
8 commit;
9 end;
10 /
PL/SQL procedure successfully completed.
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>declare
2 var_sql varchar2(10000);
3 begin
4 for i in 1 .. 10000 loop
5 var_sql:= 'insert into test_hash values (1,''test'||i||''')';
6 execute immediate var_sql;
7 end loop;
8 commit;
9 end;
10 /
PL/SQL procedure successfully completed.
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p1);
COUNT(*)
----------
6146
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p2);
COUNT(*)
----------
6192
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p3);
COUNT(*)
----------
12533
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p4);
COUNT(*)
----------
52235

HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p5);
COUNT(*)
----------
6123
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p6);
COUNT(*)
----------
6059
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash where object_id=1;
COUNT(*)
----------
40000
HCN@HCN>

 
可以看到插入的40000行object_id=1的数据都全部插入到了p4这个分区
三、是否指定一个当前分区中存在的键值,数据就会插入到该分区?
HCN@HCN>
HCN@HCN>select * from test_hash partition(p1) where rownum<20;
OBJECT_ID OBJECT_NAME
---------- --------------------------------------------------
6 C_TS#
46 I_COL2
54 I_CCOL1
34 I_UNDO1
30 I_COBJ#
55 I_CCOL2
11 I_USER#
74 IDL_CHAR$
81 TRIGGER$
88 OBJPRIV$
97 COM$
OBJECT_ID OBJECT_NAME
---------- --------------------------------------------------
108 I_COM1
111 I_ARGUMENT1
133 I_PROFNAME
146 I_AUDIT
174 I_ATTRCOL1
179 I_OID1
184 I_TYPE1
215 SYS_LOB0000000212C00049$$
19 rows selected.
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p1) where object_id=108;
COUNT(*)
----------
1
HCN@HCN>
HCN@HCN>
HCN@HCN>
HCN@HCN>declare
2 var_sql varchar2(10000);
3 begin
4 for i in 1 .. 10000 loop
5 var_sql:= 'insert into test_hash values (108,''test'||i||''')';
6 execute immediate var_sql;
7 end loop;
8 commit;
9 end;
10 /
PL/SQL procedure successfully completed.
HCN@HCN>
HCN@HCN>
HCN@HCN>select count(*) from test_hash partition(p1) where object_id=108;
COUNT(*)
----------
10001
HCN@HCN>
HCN@HCN>

可以看到p1分区中存在object_id=108的数据,测试插入object_id=108的10000行数据,插入后的数据都全部在p1分区中。
因此可以证明当hash分区的键值列出现数据倾斜的时候,会导致分区表中的数据也出现数据倾斜。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论