
OB 4.3 版本有不少新功能,最吸引人的就是全面支持列存,本文拿到新版本后就快速体验一下。
本文会导入 TPCH 100G 数据,测试 4.3 的两种大批量数据旁路导入功能,观察行存和列存下的数据容量和压缩比、表的行存和列存在线转换功能以及列存对 TPCH的 OLAP 查询性能的提升效果。
1. 数据库环境说明
软件版本是企业版 4.3 ,信息如下。
SELECT ZONE,svr_ip,build_version FROM oceanbase.__all_server;+-------+-----------+-------------------------------------------------------------------------------------------+| ZONE | svr_ip | build_version |+-------+-----------+-------------------------------------------------------------------------------------------+| zone1 | 10.0.0.65 | 4.3.0.1_101000062024032200-b59432e535c48e8b8828190c803b6c7736413ff9(Mar 22 2024 00:52:21) |+-------+-----------+-------------------------------------------------------------------------------------------+1 row in set (0.003 sec)
租户资源分配信息,测试租户是 ORACLE 租户,资源规格 12C12G 。
select t1.name resource_pool_name, t2.`name` unit_config_name,t2.max_cpu, t2.min_cpu,round(t2.memory_size/1024/1024/1024,2) mem_size_gb,round(t2.log_disk_size/1024/1024/1024,2) log_disk_size_gb ,t3.zone, concat(t3.svr_ip,':',t3.`svr_port`) observer,t4.tenant_id, t4.tenant_namefrom __all_resource_pool t1join __all_unit_config t2 on (t1.unit_config_id=t2.unit_config_id)join __all_unit t3 on (t1.`resource_pool_id` = t3.`resource_pool_id`)left join __all_tenant t4 on (t1.tenant_id=t4.tenant_id)order by t1.`resource_pool_id`, t2.`unit_config_id`, t3.unit_id;+--------------------+-----------------------------------+---------+---------+-------------+------------------+-------+----------------+-----------+-------------+| resource_pool_name | unit_config_name | max_cpu | min_cpu | mem_size_gb | log_disk_size_gb | zone | observer | tenant_id | tenant_name |+--------------------+-----------------------------------+---------+---------+-------------+------------------+-------+----------------+-----------+-------------+| sys_pool | config_sys_zone1_U4C4G_pmy | 4 | 4 | 4.00 | 12.00 | zone1 | 10.0.0.65:2882 | 1 | sys || oboracle_pool | config_oboracle_zone1_U12C12G_eiz | 12 | 12 | 12.00 | 36.00 | zone1 | 10.0.0.65:2882 | 1012 | oboracle |+--------------------+-----------------------------------+---------+---------+-------------+------------------+-------+----------------+-----------+-------------+2 rows in set (0.012 sec)
2. TPC-H 数据文件
TPC-H 官方提供了程序生成测试数据,这里我生成时规格参数 s 指定为 100,实际数据量大概 107GB 左右。大表分为 9 个文件,最大的表是 lineitem 的数据。数据文件格式是 CSV 。
[root@server065 s100]# ls -lrth lineitem.tbl.*-rwxrwxrwx. 1 admin admin 8.2G Apr 12 11:18 lineitem.tbl.1-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.3-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.8-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.5-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.6-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.4-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.9-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.2-rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.7[root@server065 s100]# wc -l lineitem.tbl.*66653245 lineitem.tbl.166678802 lineitem.tbl.266668212 lineitem.tbl.366678870 lineitem.tbl.466665175 lineitem.tbl.566675199 lineitem.tbl.666669037 lineitem.tbl.766671890 lineitem.tbl.866677472 lineitem.tbl.9600037902 total
3. TPCH 表结构。
为了发挥OB 分布式能力,大表都是分区表,先使用默认的行存格式。
CREATE TABLE "REGION" ("R_REGIONKEY" NUMBER NOT NULL,"R_NAME" CHAR(25 BYTE) DEFAULT NULL,"R_COMMENT" VARCHAR2(152 BYTE) DEFAULT NULL,CONSTRAINT "REGION_OBPK_1712656377925321" PRIMARY KEY ("R_REGIONKEY");CREATE TABLE "NATION" ("N_NATIONKEY" NUMBER NOT NULL,"N_NAME" CHAR(25 BYTE) DEFAULT NULL,"N_REGIONKEY" NUMBER DEFAULT NULL,"N_COMMENT" VARCHAR2(152 BYTE) DEFAULT NULL,CONSTRAINT "NATION_OBPK_1712656377735819" PRIMARY KEY ("N_NATIONKEY");CREATE TABLE "PART" ("P_PARTKEY" NUMBER NOT NULL,"P_NAME" VARCHAR2(55 BYTE) DEFAULT NULL,"P_MFGR" CHAR(25 BYTE) DEFAULT NULL,"P_BRAND" CHAR(10 BYTE) DEFAULT NULL,"P_TYPE" VARCHAR2(25 BYTE) DEFAULT NULL,"P_SIZE" NUMBER DEFAULT NULL,"P_CONTAINER" CHAR(10 BYTE) DEFAULT NULL,"P_RETAILPRICE" NUMBER DEFAULT NULL,"P_COMMENT" VARCHAR2(23 BYTE) DEFAULT NULL,CONSTRAINT "PART_OBPK_1712656377147466" PRIMARY KEY ("P_PARTKEY"))PARTITION BY HASH ("P_PARTKEY") PARTITIONS 9;CREATE TABLE "PARTSUPP" ("PS_PARTKEY" NUMBER NOT NULL,"PS_SUPPKEY" NUMBER NOT NULL,"PS_AVAILQTY" NUMBER DEFAULT NULL,"PS_SUPPLYCOST" NUMBER DEFAULT NULL,"PS_COMMENT" VARCHAR2(199 BYTE) DEFAULT NULL,CONSTRAINT "PARTSUPP_OBPK_1712656376960786" PRIMARY KEY ("PS_PARTKEY", "PS_SUPPKEY"))PARTITION BY HASH ("PS_PARTKEY") PARTITIONS 9;CREATE TABLE "SUPPLIER" ("S_SUPPKEY" NUMBER NOT NULL,"S_NAME" CHAR(25 BYTE) DEFAULT NULL,"S_ADDRESS" VARCHAR2(40 BYTE) DEFAULT NULL,"S_NATIONKEY" NUMBER DEFAULT NULL,"S_PHONE" CHAR(15 BYTE) DEFAULT NULL,"S_ACCTBAL" NUMBER DEFAULT NULL,"S_COMMENT" VARCHAR2(101 BYTE) DEFAULT NULL,CONSTRAINT "SUPPLIER_OBPK_1712656377544582" PRIMARY KEY ("S_SUPPKEY"))PARTITION BY HASH ("S_SUPPKEY") PARTITIONS 9;CREATE TABLE "CUSTOMER" ("C_CUSTKEY" NUMBER NOT NULL,"C_NAME" VARCHAR2(25 BYTE) DEFAULT NULL,"C_ADDRESS" VARCHAR2(40 BYTE) DEFAULT NULL,"C_NATIONKEY" NUMBER DEFAULT NULL,"C_PHONE" CHAR(15 BYTE) DEFAULT NULL,"C_ACCTBAL" NUMBER DEFAULT NULL,"C_MKTSEGMENT" CHAR(10 BYTE) DEFAULT NULL,"C_COMMENT" VARCHAR2(117 BYTE) DEFAULT NULL,CONSTRAINT "CUSTOMER_OBPK_1712656377359086" PRIMARY KEY ("C_CUSTKEY"))PARTITION BY HASH ("C_CUSTKEY") PARTITIONS 9;CREATE TABLE "ORDERS" ("O_ORDERKEY" NUMBER NOT NULL ENABLE,"O_CUSTKEY" NUMBER NOT NULL ENABLE,"O_ORDERSTATUS" CHAR(1) DEFAULT NULL,"O_TOTALPRICE" NUMBER DEFAULT NULL,"O_ORDERDATE" DATE NOT NULL ENABLE,"O_ORDERPRIORITY" CHAR(15) DEFAULT NULL,"O_CLERK" CHAR(15) DEFAULT NULL,"O_SHIPPRIORITY" NUMBER DEFAULT NULL,"O_COMMENT" VARCHAR2(79) DEFAULT NULL,PRIMARY KEY ("O_ORDERKEY")) COMPRESS FOR QUERY LOW REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 TABLEGROUP = 'TPCH_TG_SF_TPC_USER_LINEITEM_ORDER_GROUP'partition by hash(o_orderkey) PARTITIONS 16;CREATE TABLE "LINEITEM" ("L_ORDERKEY" NUMBER NOT NULL,"L_PARTKEY" NUMBER NOT NULL,"L_SUPPKEY" NUMBER NOT NULL,"L_LINENUMBER" NUMBER NOT NULL,"L_QUANTITY" NUMBER NOT NULL,"L_EXTENDEDPRICE" NUMBER NOT NULL,"L_DISCOUNT" NUMBER NOT NULL,"L_TAX" NUMBER NOT NULL,"L_RETURNFLAG" CHAR(1 BYTE) DEFAULT NULL,"L_LINESTATUS" CHAR(1 BYTE) DEFAULT NULL,"L_SHIPDATE" DATE NOT NULL,"L_COMMITDATE" DATE DEFAULT NULL,"L_RECEIPTDATE" DATE DEFAULT NULL,"L_SHIPINSTRUCT" CHAR(25 BYTE) DEFAULT NULL,"L_SHIPMODE" CHAR(10 BYTE) DEFAULT NULL,"L_COMMENT" VARCHAR2(44 BYTE) DEFAULT NULL,CONSTRAINT "LINEITEM_OBPK_1712656376567797" PRIMARY KEY ("L_ORDERKEY", "L_LINENUMBER"))PARTITION BY HASH ("L_ORDERKEY") PARTITIONS 9;
4. OB 4.3 数据导入方法。
OB 很早就支持 LOAD DATA
命令加载 CSV 文件,要求文件在 OB 集群节点上,从 4.3 开始支持客户端导入。
服务端加载文件命令如下。服务端加载文件有个优势就是文件名使用通配符,可以导入同时多个文件。当然前提是租户内存资源足够大。
load data /*+ parallel(12) append */ infile '/data2/tpch/s100/lineitem.tbl.*' into table lineitem fields terminated by '|';
APPEND
Hint 是旁路导入提示,直接在数据文件中分配空间并写入数据。APPEND
Hint 默认等同于使用的 direct(false, 0)
,同时可以实现 在线收集统计信息(GATHER_OPTIMIZER_STATISTICS
Hint)的功能。
旁路导入跟普通的写入原理差异很大,表现形式就是普通的数据写入会消耗租户内存的MemTable ,旁路写入不用 MemTable 内存。但是旁路写入会使用 KVCache 内存。所以需要确保 KVCache 最大内存够用;否则,也可能会碰到内存不足报错。这一块 OB 4.3 后面还会优化这块逻辑,保证不报错。
(SYS@localhost:sql.sock) [TPCH]> load data /*+ parallel(12) append */ infile '/data2/tpch/s100/customer.tbl.*' into table customer fields terminated by '|';ORA-00600: internal error code, arguments: -4013, No memory or reach tenant memory limit
客户端加载文件命令需要用 LOAD DATA LOCAL INFILE
。客户端不支持文件名用通配符,需要一个个写文件。
load data /*+ parallel(12) append */ local infile '/data/1/tpch/s1/lineitem.tbl.1' into table lineitem fields terminated by '|';load data /*+ parallel(12) append */ local infile '/data/1/tpch/s1/lineitem.tbl.2' into table lineitem fields terminated by '|';
5. 数据容量大小和压缩比。
统计数据大小有多个途径。一是看视图 oceanbase.cdb_ob_tablet_replicas 的 required_size
和 data_size
(前者包含后者)。这个字段的值在租户合并(MAJOR FREEZE
)后更新,并且体现的是一份数据的容量大小。
另外一个是看视图 oceanbase.GV$OB_SSTABLES 的 size
。这个视图里是 LSM Tree 不同层次的 SSTable 数据。这个大小会随着转储和合并变化。在数据加载的过程中,同一份数据可能出现在多个版本中。所以数据的总体大小可能会超出单份数据的总大小。当数据导入结束后,两次全量合并后并过了快照保留时间,这部分大小就稳定下来了。
SELECT t1.tenant_id, t1.database_name, t1.table_name, t1.tablet_id, t1.partition_name, t1.svr_ip, round(t2.data_size/1024/1024/1024,1) data_size_gb, round(t2.required_size/1024/1024/1024,1) required_size_gb, t1.ls_id, s.table_type, round(s.SIZE/1024/1024/1024,2) size_gbFROM oceanbase.CDB_OB_TABLE_LOCATIONS t1JOIN oceanbase.cdb_ob_tablet_replicas t2 ON (t1.tablet_id=t2.tablet_id AND t1.ls_id=t2.ls_id)JOIN oceanbase.GV$OB_SSTABLES s ON (t1.tenant_id=s.tenant_id AND t1.ls_id=s.ls_id AND t1.svr_ip=s.svr_ip AND t1.tablet_id=s.tablet_id)WHERE t1.tenant_id=1012 AND t1.database_name='TPCH' AND t1.ROLE='LEADER' AND s.table_type <> 'MEMTABLE' AND table_name IN ('lineitem');

所以,统计总体的 TPC-H 的容量大小 SQL 有两个方法。当导入结束后两次合并后,这两个方法结果基本一致。
SELECT t1.tenant_id, t1.database_name, t1.table_name, round(sum(t2.required_size)/1024/1024/1024) required_size_gb, count(*) cntFROM oceanbase.CDB_OB_TABLE_LOCATIONS t1JOIN oceanbase.cdb_ob_tablet_replicas t2 ON (t1.tablet_id=t2.tablet_id AND t1.ls_id=t2.ls_id)WHERE t1.tenant_id=1012 AND t1.database_name='TPCH' AND t1.ROLE='LEADER'GROUP BY t1.tenant_id, t1.database_name, t1.table_name;SELECT t1.tenant_id, t1.database_name, t1.table_name, round(sum(s.size)/1024/1024/1024) data_size_gb, count(*) cntFROM oceanbase.CDB_OB_TABLE_LOCATIONS t1JOIN oceanbase.cdb_ob_tablet_replicas t2 ON (t1.tablet_id=t2.tablet_id AND t1.ls_id=t2.ls_id)JOIN oceanbase.GV$OB_SSTABLES s ON (t1.tenant_id=s.tenant_id AND t1.ls_id=s.ls_id AND t1.svr_ip=s.svr_ip AND t1.tablet_id=s.tablet_id)WHERE t1.tenant_id=1012 AND t1.database_name='TPCH' AND t1.ROLE='LEADER' AND s.table_type <> 'MEMTABLE'GROUP BY t1.tenant_id, t1.database_name, t1.table_name;

简单计算一下数据压缩比。前面 lineitem 数据文件大小约为 74.7G ,在 OB 中表 lineitem 最终占用容量 38G 左右,压缩比约 51%。这个压缩比不算高,其原因是 TPC-H 的字符串列安全是随机字符串,真实的业务文本数据压缩比一定比这个高。
6. TPC-H 查询性能。
这里测试服务器资源都是虚拟机,租户也只有 12C12G,CPU 是超卖的,不等同于物理的CPU。此外集群和租户也没有刻意去做参数调优所以这个测试结果只是特定资源下的结果,不一定表示 OB 的全部真实能力。
在执行之前特别将 KVCache 清空。可以通过视图 oceanbase.gv$ob_memory 观察内存。
ALTER SYSTEM flush kvcache;SELECT tenant_id, svr_ip, ctx_name, mod_name, count, round(HOLD/1024/1024/1024,2) hold_gb, round(used/1024/1024/1024,2) used_gbFROM oceanbase.gv$ob_memoryWHERE tenant_id=1012ORDER BY HOLD DESC LIMIT 5;

测试 SQL 脚本就是 TPC-H 工具针对 100G 规模数据生成的 SQL。详情如下,按顺序对应 Q1 ~ Q22 。研究这些 SQL 的执行计划有助于深入理解 OB 4.3 的 SQL 引擎在 OLAP 方面的能力。这个以后再专门分享。
select /*+ TPCH_Q1 PARALLEL(16) */l_returnflag,l_linestatus,sum(l_quantity) as sum_qty,sum(l_extendedprice) as sum_base_price,sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,avg(l_quantity) as avg_qty,avg(l_extendedprice) as avg_price,avg(l_discount) as avg_disc,count(*) as count_orderfromlineitemwherel_shipdate <= date '1998-12-01' - interval '90' daygroup byl_returnflag,l_linestatusorder byl_returnflag,l_linestatus;SELECT * FROM (SELECT /*+ TPCH_Q2 PARALLEL(16) */s_acctbal,s_name,n_name,p_partkey,p_mfgr,s_address,s_phone,s_commentfrompart,supplier,partsupp,nation,regionwhere p_partkey = ps_partkey ands_suppkey = ps_suppkey andp_size = 30 andp_type like '%STEEL' ands_nationkey = n_nationkey andn_regionkey = r_regionkey andr_name = 'ASIA' andps_supplycost = (SELECT min(ps_supplycost)frompartsupp,supplier,nation,regionwherep_partkey = ps_partkey ands_suppkey = ps_suppkey ands_nationkey = n_nationkey andn_regionkey = r_regionkey and r_name = 'ASIA' )order bys_acctbal desc,n_name,s_name,p_partkey)WHERE rownum <= 100;SELECT * FROM (SELECT /*+ TPCH_Q3 PARALLEL(16) */ *from (SELECTl_orderkey,o_orderdate,o_shippriority,sum(l_extendedprice * (1 - l_discount)) as revenuefromcustomer,orders,lineitemwherec_mktsegment = 'BUILDING' andc_custkey = o_custkey andl_orderkey = o_orderkey ando_orderdate < '1995-03-15' andl_shipdate > '1995-03-15'group byl_orderkey,o_orderdate,o_shippriorityorder byrevenue desc,o_orderdate))WHERE ROWNUM <= 10;SELECT /*+ TPCH_Q4 PARALLEL(16) no_unnest */o_orderpriority,count(*) as order_countfromorderswhereo_orderdate >= DATE'1993-07-01' ando_orderdate < DATE'1993-07-01' + interval '3' month andexists (SELECT *from lineitemwherel_orderkey = o_orderkey andl_commitdate < l_receiptdate )group byo_orderpriorityorder byo_orderpriority;select /*+ TPCH_Q5 PARALLEL(16) */n_name,sum(l_extendedprice * (1 - l_discount)) as revenuefromcustomer,orders,lineitem,supplier,nation,regionwherec_custkey = o_custkeyand l_orderkey = o_orderkeyand l_suppkey = s_suppkeyand c_nationkey = s_nationkeyand c_nationkey = n_nationkeyand n_regionkey = r_regionkeyand r_name = 'EUROPE'and o_orderdate >= date '1995-01-01'and o_orderdate < date '1995-01-01' + interval '1' yeargroup byn_nameorder byrevenue desc;select /*+ TPCH_Q6 PARALLEL(16) */sum(l_extendedprice * l_discount) as revenuefromlineitemwherel_shipdate >= date '1994-01-01'and l_shipdate < date '1994-01-01' + interval '1' yearand l_discount between .06 - 0.01 and .06 + 0.01and l_quantity < 24;select /*+ TPCH_Q7 stream(0) PARALLEL(16) */supp_nation,cust_nation,l_year,sum(volume) as revenuefrom(selectn1.n_name as supp_nation,n2.n_name as cust_nation,extract(year from l_shipdate) as l_year,l_extendedprice * (1 - l_discount) as volumefromsupplier,lineitem,orders,customer,nation n1,nation n2wheres_suppkey = l_suppkeyand o_orderkey = l_orderkeyand c_custkey = o_custkeyand s_nationkey = n1.n_nationkeyand c_nationkey = n2.n_nationkeyand ((n1.n_name = 'RUSSIA' and n2.n_name = 'IRAQ')or (n1.n_name = 'IRAQ' and n2.n_name = 'RUSSIA'))and l_shipdate between date '1995-01-01' and date '1996-12-31') shippinggroup bysupp_nation,cust_nation,l_yearorder bysupp_nation,cust_nation,l_year;select /*+ TPCH_Q8 stream(0) PARALLEL(16) */o_year,sum(casewhen nation = 'IRAQ' then volumeelse 0end) sum(volume) as mkt_sharefrom(selectextract(year from o_orderdate) as o_year,l_extendedprice * (1 - l_discount) as volume,n2.n_name as nationfrompart,supplier,lineitem,orders,customer,nation n1,nation n2,regionwherep_partkey = l_partkeyand s_suppkey = l_suppkeyand l_orderkey = o_orderkeyand o_custkey = c_custkeyand c_nationkey = n1.n_nationkeyand n1.n_regionkey = r_regionkeyand r_name = 'MIDDLE EAST'and s_nationkey = n2.n_nationkeyand o_orderdate between date '1995-01-01' and date '1996-12-31'and p_type = 'MEDIUM PLATED TIN') all_nationsgroup byo_yearorder byo_year;select /*+ TPCH_Q9 PARALLEL(16) */nation,o_year,sum(amount)as sum_profit from(selectn_name as nation,extract(year from o_orderdate) as o_year,l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantityas amountfrompart,supplier,lineitem,partsupp,orders,nationwheres_suppkey = l_suppkeyand ps_suppkey = l_suppkeyand ps_partkey = l_partkeyand p_partkey = l_partkeyand o_orderkey = l_orderkeyand s_nationkey = n_nationkeyand p_name like '%indian%' )profitgroup bynation,o_yearorder bynation,o_yeardesc;select */*+ TPCH_Q10 PARALLEL(16) */ from (select /*+PARALLEL(16)*/ c_custkey,c_name,sum(l_extendedprice * (1 - l_discount)) as revenue,c_acctbal,n_name,c_address,c_phone,c_commentfromcustomer,orders,lineitem,nationwherec_custkey = o_custkeyand l_orderkey = o_orderkeyand o_orderdate >= date '1994-07-01'and o_orderdate < date '1994-07-01' + interval '3' monthand l_returnflag = 'R'and c_nationkey = n_nationkeygroup byc_custkey,c_name,c_acctbal,c_phone,n_name,c_address,c_commentorder byrevenue desc)where rownum <= 20;select /*+ TPCH_Q11 PARALLEL(16) */ps_partkey,sum(ps_supplycost * ps_availqty) as valuefrompartsupp,supplier,nationwhereps_suppkey = s_suppkeyand s_nationkey = n_nationkeyand n_name = 'GERMANY'group byps_partkey havingsum(ps_supplycost * ps_availqty) > (selectsum(ps_supplycost * ps_availqty) * 0.0000100000frompartsupp,supplier,nationwhereps_suppkey = s_suppkeyand s_nationkey = n_nationkeyand n_name = 'GERMANY')order byvalue desc;select /*+ TPCH_Q12 PARALLEL(16) */l_shipmode,sum(casewhen o_orderpriority = '1-URGENT'or o_orderpriority = '2-HIGH'then 1else 0end) as high_line_count,sum(casewhen o_orderpriority <> '1-URGENT'and o_orderpriority <> '2-HIGH'then 1else 0end) as low_line_countfromorders,lineitemwhereo_orderkey = l_orderkeyand l_shipmode in ('MAIL', 'SHIP')and l_commitdate < l_receiptdateand l_shipdate < l_commitdateand l_receiptdate >= date '1994-01-01'and l_receiptdate < date '1994-01-01' + interval '1' yeargroup byl_shipmodeorder byl_shipmode;select /*+ TPCH_Q14 PARALLEL(16) */100.00 * sum(casewhen p_type like 'PROMO%'then l_extendedprice * (1 - l_discount)else 0end) sum(l_extendedprice * (1 - l_discount)) as promo_revenuefromlineitem,partwherel_partkey = p_partkeyand l_shipdate >= date '1995-09-01'and l_shipdate < date '1995-09-01' + interval '1' month;create view revenue0 (supplier_no, total_revenue) asselectl_suppkey,sum(l_extendedprice * (1 - l_discount))fromlineitemwherel_shipdate >= date '1996-01-01'and l_shipdate < date '1996-01-01' + interval '3' monthgroup byl_suppkey;select /*+ TPCH_Q15 PARALLEL(16) */s_suppkey,s_name,s_address,s_phone,total_revenuefromsupplier,revenue0wheres_suppkey = supplier_noand total_revenue = (selectmax(total_revenue)fromrevenue0)order bys_suppkey;drop view revenue0;select /*+ TPCH_Q16 PARALLEL(16) */p_brand,p_type,p_size,count(distinct ps_suppkey) as supplier_cntfrompartsupp,partwherep_partkey = ps_partkeyand p_brand <> 'Brand#45'and p_type not like 'MEDIUM POLISHED%'and p_size in (49, 14, 23, 45, 19, 3, 36, 9)and ps_suppkey not in (selects_suppkeyfromsupplierwheres_comment like '%Customer%Complaints%')group byp_brand,p_type,p_sizeorder bysupplier_cnt desc,p_brand,p_type,p_size;select /*+ TPCH_Q17 PARALLEL(16) */sum(l_extendedprice) 7.0 as avg_yearlyfromlineitem,partwherep_partkey = l_partkeyand p_brand = 'Brand#23'and p_container = 'MED BOX'and l_quantity < (select0.2 * avg(l_quantity)fromlineitemwherel_partkey = p_partkey);select /*+ TPCH_Q18 stream(0) PARALLEL(16)*/*from(selectc_name,c_custkey,o_orderkey,o_orderdate,o_totalprice,sum(l_quantity)fromcustomer,orders,lineitemwhereo_orderkeyin(selectl_orderkeyfromlineitemgroup byl_orderkeyhavingsum(l_quantity) > 313 )and c_custkey = o_custkeyand o_orderkey = l_orderkeygroup byc_name,c_custkey,o_orderkey,o_orderdate,o_totalpriceorder byo_totalprice desc,o_orderdate);select /*+ TPCH_Q19 PARALLEL(16) */sum(l_extendedprice* (1 - l_discount)) as revenuefromlineitem,partwhere(p_partkey = l_partkeyand p_brand = 'Brand#12'and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')and l_quantity >= 1 and l_quantity <= 1 + 10and p_size between 1 and 5and l_shipmode in ('AIR', 'AIR REG')and l_shipinstruct = 'DELIVER IN PERSON')or(p_partkey = l_partkeyand p_brand = 'Brand#23'and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')and l_quantity >= 10 and l_quantity <= 10 + 10and p_size between 1 and 10and l_shipmode in ('AIR', 'AIR REG')and l_shipinstruct = 'DELIVER IN PERSON')or(p_partkey = l_partkeyand p_brand = 'Brand#34'and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')and l_quantity >= 20 and l_quantity <= 20 + 10and p_size between 1 and 15and l_shipmode in ('AIR', 'AIR REG')and l_shipinstruct = 'DELIVER IN PERSON');select /*+ TPCH_Q20 PARALLEL(16) */s_name,s_addressfromsupplier,nationwheres_suppkey in (selectps_suppkeyfrompartsuppwhereps_partkey in (selectp_partkeyfrompartwherep_name like 'forest%')and ps_availqty > (select0.5 * sum(l_quantity)fromlineitemwherel_partkey = ps_partkeyand l_suppkey = ps_suppkeyand l_shipdate >= date '1994-01-01'and l_shipdate < date '1994-01-01' + interval '1' year))and s_nationkey = n_nationkeyand n_name = 'CANADA'order bys_name;select * from (select/*+ TPCH_Q21 PARALLEL(16) */s_name,count(*) as numwaitfromsupplier,lineitem l1,orders,nationwheres_suppkey = l1.l_suppkeyand o_orderkey = l1.l_orderkeyand o_orderstatus = 'F'and l1.l_receiptdate > l1.l_commitdateand exists (select*fromlineitem l2wherel2.l_orderkey = l1.l_orderkeyand l2.l_suppkey <> l1.l_suppkey)and not exists (select*fromlineitem l3wherel3.l_orderkey = l1.l_orderkeyand l3.l_suppkey <> l1.l_suppkeyand l3.l_receiptdate > l3.l_commitdate)and s_nationkey = n_nationkeyand n_name = 'CHINA'group bys_nameorder bynumwait desc,s_name)where rownum <= 100;
上面运行结果后面跟列存下的结果一起对比呈现。
7. 在线将表修改为列存模式。
OB 4.2 以及以前的版本,在宏块层面数据存储呈现行存特点,在微块层面存储呈列存特点。OB 4.3 后正式推出基于 LSM-Tree 架构的列存存储引擎。具体原理以后再研究,这里先看看用法和效果。
将表变更为列存表。具体 SQL 如下:
下面先将 TPC-H 中的分区表(也是大表)变更为单存的列存表。
alter table part add column group (each column);alter table partsupp add column group (each column);alter table supplier add column group (each column);alter table customer add column group (each column);alter table lineitem add column group (each column);alter table orders add column group (each column);
将大表从行存变更为列存,OB 数据文件剩余空间初步估计需要预留当前表的空间的两倍以上空间(有待核实);否则,可能会碰到空间不足报错。在集群节点空间统计上不要依赖视图 gv$ob_servers 的 data_disk_in_use ,初步看它代表的是单份数据的容量并且可能只是在合并结束后才更新。
(SYS@localhost:sql.sock) [TPCH]> alter table lineitem add column group (each column);ORA-00600: internal error code, arguments: -4184, Server out of disk space
上面 lineitem 表是最大的表,我变更过程中多次遭遇空间失败(不断的扩容数据文件)。在观察表的 SSTable 大小时,有时候能看到 3 个全量版本的大小(有频繁手动触发 major freeze 命令)。存储空间应该不是 3 份,多个相邻版本的 SSTable 空间有一定复用(这里我并没有更新实际表数据)。估计跟集群的快照保留时间 undo_retention 是 1800s 有关。

测试环境磁盘 IO 能力跟不上导致上面这个 DDL 执行时间非常的长。

最终 DDL 成功后的 TPC-H 表 lineitem 的特点如下。

估计在视图 GV$OB_SSTABLE
里,表的每个分区的每个列单独一个 SSTable 文件。
8. INSERT INTO SELECT 语句旁路导入。
由于上面在线变更实在太慢,还可以使用 INSERT INTO SELECT FROM
语句旁路导入数据到列存表。然后再更改表名。
使用限制
只支持 PDML(Parallel Data Manipulation Language,并行数据操纵语言),非 PDML 不能用旁路导入。
不能两个语句同时写一个表,因为导入的过程中会先加表锁。
不支持在触发器(Trigger)使用。
不能在多行事务(包含多个操作的事务)中运行。
判断是否用了旁路导入一个可以看解析执行计划(explain extended
)最后的 Note ,二个看租户性能监控 [Memstore]
监控。
(SYS@localhost:sql.sock) [TPCH]> explain extended INSERT *+ append enable_parallel_dml parallel(8) */ INTO orders2 SELECT * FROM orders;+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Query Plan |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| ================================================================================== || |ID|OPERATOR |NAME |EST.ROWS |EST.TIME(us)| || ---------------------------------------------------------------------------------- || |0 |PX COORDINATOR | |150000000|1436592693 | || |1 |+-EXCHANGE OUT DISTR |:EX10001 |150000000|1149741363 | || |2 | +-INSERT | |150000000|1069069322 | || |3 | +-EXCHANGE IN DISTR | |150000000|119166652 | || |4 | +-EXCHANGE OUT DISTR (PKEY HASH)|:EX10000 |150000000|83310236 | || |5 | +-SUBPLAN SCAN |ANONYMOUS_VIEW1|150000000|2638194 | || |6 | +-PX BLOCK ITERATOR | |150000000|2588564 | || |7 | +-TABLE FULL SCAN |ORDERS |150000000|2588564 | || ================================================================================== || Outputs & filters: || ------------------------------------- |<......>Used Hint: || ------------------------------------- || *+ || || USE_PLAN_CACHE( NONE ) || PARALLEL(8) || ENABLE_PARALLEL_DML || APPEND || APPEND || */<......>| Plan Type: || DISTRIBUTED || Note: || Degree of Parallelism is 8 because of hint || Direct-mode is enabled in insert into select |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+100 rows in set (0.047 sec)(SYS@localhost:sql.sock) [TPCH]> INSERT /*+ append enable_parallel_dml parallel(8) */ INTO orders2 SELECT * FROM orders;Query OK, 150000000 rows affected (16 min 52.900 sec)Records: 150000000 Duplicates: 0 Warnings: 0
还是 INSERT INTO SELECT FROM
这样转换到列存表数据快。
TPC-H 所有表空间两种统计结果如下。

这里面的 cnt
是表的分区数量。

这里面的 cnt
是表的 sstable 数量。看统计的空间大概是前者的两倍。不过再多合并几次等一段时间,这个结果就跟前面查询差不多了。

上图中 orders3 是老的行存表,orders 是列存表。可以看到容量差异不大,sstable 数量是10 倍的差异,orders 表正好有 9 列。sstable 里 9 个 normal_cg
加 1 个 co_major
就是 10 个 sstable。
9. 行存和列存对比结论
首先看 TPC-H 表的容量对比。

初步看表容量不大的时候,从行存到列存空间变化不大。表越大,空间缩减效果越明显。最大的表 lineitem 容量从 38G 降到 21G 算是很明显了。这也是行存的一个优势。
再看看 TPC-H 的 22 条查询语句的性能对比。

可以看到 22 个查询的执行时间都明显减少,总时间减少 78% 。列存下 TPC-H 的复杂查询性能提升非常明显!
22 个查询中 Q2 查询提升最明显,看看它在列存结构下的执行计划。
===========================================================================|ID|OPERATOR |NAME |EST.ROWS |EST.TIME(us)|---------------------------------------------------------------------------|0 |PX COORDINATOR MERGE SORT | |5 |9829316 ||1 |└─EXCHANGE OUT DISTR |:EX10001|5 |9829305 ||2 | └─SORT | |5 |9829304 ||3 | └─HASH GROUP BY | |5 |9829304 ||4 | └─EXCHANGE IN DISTR | |68 |9829303 ||5 | └─EXCHANGE OUT DISTR (HASH) |:EX10000|68 |9829293 ||6 | └─HASH GROUP BY | |68 |9829270 ||7 | └─PX BLOCK ITERATOR | |578901265|1492064 ||8 | └─COLUMN TABLE FULL SCAN|LINEITEM|578901265|1492064 |===========================================================================Outputs & filters:-------------------------------------0 - output([INTERNAL_FUNCTION(LINEITEM.L_RETURNFLAG, LINEITEM.L_LINESTATUS, T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)),T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY)),T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT)),T_FUN_COUNT_SUM(T_FUN_COUNT(*)))]), filter(nil), rowset=256sort_keys([LINEITEM.L_RETURNFLAG, ASC], [LINEITEM.L_LINESTATUS, ASC])1 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [INTERNAL_FUNCTION(LINEITEM.L_RETURNFLAG, LINEITEM.L_LINESTATUS, T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)),T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE* cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY)),T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT)),T_FUN_COUNT_SUM(T_FUN_COUNT(*)))]), filter(nil), rowset=256dop=162 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE))],[T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY))],[T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT))]), filter(nil), rowset=256sort_keys([LINEITEM.L_RETURNFLAG, ASC], [LINEITEM.L_LINESTATUS, ASC])3 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE))],[T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY))],[T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT))]), filter(nil), rowset=256group([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS]), agg_func([T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE))],[T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY))],[T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT))])4 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE* cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,-85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],[T_FUN_COUNT(LINEITEM.L_DISCOUNT)]), filter(nil), rowset=2565 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE* cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,-85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],[T_FUN_COUNT(LINEITEM.L_DISCOUNT)]), filter(nil), rowset=256(#keys=2, [LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS]), dop=166 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE* cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,-85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],[T_FUN_COUNT(LINEITEM.L_DISCOUNT)]), filter(nil), rowset=256group([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS]), agg_func([T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE* cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,-85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],[T_FUN_COUNT(LINEITEM.L_DISCOUNT)])7 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [LINEITEM.L_QUANTITY], [LINEITEM.L_EXTENDEDPRICE], [LINEITEM.L_DISCOUNT], [LINEITEM.L_TAX]), filter(nil), rowset=2568 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [LINEITEM.L_QUANTITY], [LINEITEM.L_EXTENDEDPRICE], [LINEITEM.L_DISCOUNT], [LINEITEM.L_TAX]), filter([LINEITEM.L_SHIPDATE<= '1998-12-01 00:00:00' - INTERVAL '+90 00:00:00' DAY (2) TO SECOND (0)]), rowset=256access([LINEITEM.L_SHIPDATE], [LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [LINEITEM.L_QUANTITY], [LINEITEM.L_EXTENDEDPRICE], [LINEITEM.L_DISCOUNT],[LINEITEM.L_TAX]), partitions(p[0-15])is_index_back=false, is_global_index=false, filter_before_indexback[false],range_key([LINEITEM.L_ORDERKEY], [LINEITEM.L_LINENUMBER]), range(MIN,MIN ; MAX,MAX)always true
行存结构下执行计划如下

可以看到二者的区别就是针对大表 lineitem 的扫描方式 从 TABLE FULL SCAN
变为 COLUMN TABLE FULL SCAN
。这个表是最大的表,且 Q2 只查询这个表,也难怪性能提升最明显。
10. 结论
列存存储模型对 OLAP 场景复杂查询的性能帮助非常大,OB 4.3 版本完全支持列存以及行列混合 极大的提升了 OB 在 OLAP 场景的竞争力。后面还可以做 TPC-DS 场景的对比验证。加上原本 OB 在 OLTP 场景的出色能力,一套数据源(表)同时满足 OLTP 和 OLAP 场景的需求,对不同规模的企业用户都有很大吸引力。
不过列存模型也是刚推出,从行存在线转列存的时候,在内存和空间方面还需要注意,预留充分的资源。特别是大表在线转换的时候,尽量避免去触发合并,以及缩短快照保留时间(undo_retention
),以避免数据文件空间爆掉。OB 在列存的变更这块应该还有很大的优化空间。
更多阅读:




