暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

OB 4.3 列存表使用体验

1148

OB 4.3 版本有不少新功能,最吸引人的就是全面支持列存,本文拿到新版本后就快速体验一下。

本文会导入 TPCH 100G 数据,测试 4.3 的两种大批量数据旁路导入功能,观察行存和列存下的数据容量和压缩比、表的行存和列存在线转换功能以及列存对 TPCH的 OLAP 查询性能的提升效果。


1. 数据库环境说明

软件版本是企业版 4.3 ,信息如下。

    SELECT ZONE,svr_ip,build_version FROM oceanbase.__all_server;
    +-------+-----------+-------------------------------------------------------------------------------------------+
    | ZONE | svr_ip | build_version |
    +-------+-----------+-------------------------------------------------------------------------------------------+
    | zone1 | 10.0.0.65 | 4.3.0.1_101000062024032200-b59432e535c48e8b8828190c803b6c7736413ff9(Mar 22 2024 00:52:21) |
    +-------+-----------+-------------------------------------------------------------------------------------------+
    1 row in set (0.003 sec)

    租户资源分配信息,测试租户是 ORACLE 租户,资源规格 12C12G 。

      select t1.name resource_pool_name, t2.`name` unit_config_name, 
      t2.max_cpu, t2.min_cpu,
      round(t2.memory_size/1024/1024/1024,2) mem_size_gb,
      round(t2.log_disk_size/1024/1024/1024,2) log_disk_size_gb ,
      t3.zone, concat(t3.svr_ip,':',t3.`svr_port`) observer,
      t4.tenant_id, t4.tenant_name
      from __all_resource_pool t1
      join __all_unit_config t2 on (t1.unit_config_id=t2.unit_config_id)
      join __all_unit t3 on (t1.`resource_pool_id` = t3.`resource_pool_id`)
      left join __all_tenant t4 on (t1.tenant_id=t4.tenant_id)
      order by t1.`resource_pool_id`, t2.`unit_config_id`, t3.unit_id;


      +--------------------+-----------------------------------+---------+---------+-------------+------------------+-------+----------------+-----------+-------------+
      | resource_pool_name | unit_config_name | max_cpu | min_cpu | mem_size_gb | log_disk_size_gb | zone | observer | tenant_id | tenant_name |
      +--------------------+-----------------------------------+---------+---------+-------------+------------------+-------+----------------+-----------+-------------+
      | sys_pool | config_sys_zone1_U4C4G_pmy | 4 | 4 | 4.00 | 12.00 | zone1 | 10.0.0.65:2882 | 1 | sys |
      | oboracle_pool | config_oboracle_zone1_U12C12G_eiz | 12 | 12 | 12.00 | 36.00 | zone1 | 10.0.0.65:2882 | 1012 | oboracle |
      +--------------------+-----------------------------------+---------+---------+-------------+------------------+-------+----------------+-----------+-------------+
      2 rows in set (0.012 sec)

      2. TPC-H 数据文件

      TPC-H 官方提供了程序生成测试数据,这里我生成时规格参数 s 指定为 100,实际数据量大概 107GB 左右。大表分为 9 个文件,最大的表是 lineitem 的数据。数据文件格式是 CSV 。

        [root@server065 s100]# ls -lrth lineitem.tbl.*
        -rwxrwxrwx. 1 admin admin 8.2G Apr 12 11:18 lineitem.tbl.1
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.3
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.8
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.5
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.6
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.4
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.9
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.2
        -rwxrwxrwx. 1 admin admin 8.3G Apr 12 11:18 lineitem.tbl.7
        [root@server065 s100]# wc -l lineitem.tbl.*
        66653245 lineitem.tbl.1
        66678802 lineitem.tbl.2
        66668212 lineitem.tbl.3
        66678870 lineitem.tbl.4
        66665175 lineitem.tbl.5
        66675199 lineitem.tbl.6
        66669037 lineitem.tbl.7
        66671890 lineitem.tbl.8
        66677472 lineitem.tbl.9
        600037902 total

        3. TPCH 表结构。

        为了发挥OB 分布式能力,大表都是分区表,先使用默认的行存格式。

          CREATE TABLE "REGION" (
          "R_REGIONKEY" NUMBER NOT NULL,
          "R_NAME" CHAR(25 BYTE) DEFAULT NULL,
          "R_COMMENT" VARCHAR2(152 BYTE) DEFAULT NULL,
          CONSTRAINT "REGION_OBPK_1712656377925321" PRIMARY KEY ("R_REGIONKEY")
          ;


          CREATE TABLE "NATION" (
          "N_NATIONKEY" NUMBER NOT NULL,
          "N_NAME" CHAR(25 BYTE) DEFAULT NULL,
          "N_REGIONKEY" NUMBER DEFAULT NULL,
          "N_COMMENT" VARCHAR2(152 BYTE) DEFAULT NULL,
          CONSTRAINT "NATION_OBPK_1712656377735819" PRIMARY KEY ("N_NATIONKEY")
          ;


          CREATE TABLE "PART" (
          "P_PARTKEY" NUMBER NOT NULL,
          "P_NAME" VARCHAR2(55 BYTE) DEFAULT NULL,
          "P_MFGR" CHAR(25 BYTE) DEFAULT NULL,
          "P_BRAND" CHAR(10 BYTE) DEFAULT NULL,
          "P_TYPE" VARCHAR2(25 BYTE) DEFAULT NULL,
          "P_SIZE" NUMBER DEFAULT NULL,
          "P_CONTAINER" CHAR(10 BYTE) DEFAULT NULL,
          "P_RETAILPRICE" NUMBER DEFAULT NULL,
          "P_COMMENT" VARCHAR2(23 BYTE) DEFAULT NULL,
          CONSTRAINT "PART_OBPK_1712656377147466" PRIMARY KEY ("P_PARTKEY")
          )
          PARTITION BY HASH ("P_PARTKEY") PARTITIONS 9
          ;


          CREATE TABLE "PARTSUPP" (
          "PS_PARTKEY" NUMBER NOT NULL,
          "PS_SUPPKEY" NUMBER NOT NULL,
          "PS_AVAILQTY" NUMBER DEFAULT NULL,
          "PS_SUPPLYCOST" NUMBER DEFAULT NULL,
          "PS_COMMENT" VARCHAR2(199 BYTE) DEFAULT NULL,
          CONSTRAINT "PARTSUPP_OBPK_1712656376960786" PRIMARY KEY ("PS_PARTKEY", "PS_SUPPKEY")
          )
          PARTITION BY HASH ("PS_PARTKEY") PARTITIONS 9
          ;


          CREATE TABLE "SUPPLIER" (
          "S_SUPPKEY" NUMBER NOT NULL,
          "S_NAME" CHAR(25 BYTE) DEFAULT NULL,
          "S_ADDRESS" VARCHAR2(40 BYTE) DEFAULT NULL,
          "S_NATIONKEY" NUMBER DEFAULT NULL,
          "S_PHONE" CHAR(15 BYTE) DEFAULT NULL,
          "S_ACCTBAL" NUMBER DEFAULT NULL,
          "S_COMMENT" VARCHAR2(101 BYTE) DEFAULT NULL,
          CONSTRAINT "SUPPLIER_OBPK_1712656377544582" PRIMARY KEY ("S_SUPPKEY")
          )
          PARTITION BY HASH ("S_SUPPKEY") PARTITIONS 9
          ;


          CREATE TABLE "CUSTOMER" (
          "C_CUSTKEY" NUMBER NOT NULL,
          "C_NAME" VARCHAR2(25 BYTE) DEFAULT NULL,
          "C_ADDRESS" VARCHAR2(40 BYTE) DEFAULT NULL,
          "C_NATIONKEY" NUMBER DEFAULT NULL,
          "C_PHONE" CHAR(15 BYTE) DEFAULT NULL,
          "C_ACCTBAL" NUMBER DEFAULT NULL,
          "C_MKTSEGMENT" CHAR(10 BYTE) DEFAULT NULL,
          "C_COMMENT" VARCHAR2(117 BYTE) DEFAULT NULL,
          CONSTRAINT "CUSTOMER_OBPK_1712656377359086" PRIMARY KEY ("C_CUSTKEY")
          )
          PARTITION BY HASH ("C_CUSTKEY") PARTITIONS 9
          ;


          CREATE TABLE "ORDERS" (
          "O_ORDERKEY" NUMBER NOT NULL ENABLE,
          "O_CUSTKEY" NUMBER NOT NULL ENABLE,
          "O_ORDERSTATUS" CHAR(1) DEFAULT NULL,
          "O_TOTALPRICE" NUMBER DEFAULT NULL,
          "O_ORDERDATE" DATE NOT NULL ENABLE,
          "O_ORDERPRIORITY" CHAR(15) DEFAULT NULL,
          "O_CLERK" CHAR(15) DEFAULT NULL,
          "O_SHIPPRIORITY" NUMBER DEFAULT NULL,
          "O_COMMENT" VARCHAR2(79) DEFAULT NULL,
          PRIMARY KEY ("O_ORDERKEY")
          ) COMPRESS FOR QUERY LOW REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 TABLEGROUP = 'TPCH_TG_SF_TPC_USER_LINEITEM_ORDER_GROUP'
          partition by hash(o_orderkey) PARTITIONS 16
          ;


          CREATE TABLE "LINEITEM" (
          "L_ORDERKEY" NUMBER NOT NULL,
          "L_PARTKEY" NUMBER NOT NULL,
          "L_SUPPKEY" NUMBER NOT NULL,
          "L_LINENUMBER" NUMBER NOT NULL,
          "L_QUANTITY" NUMBER NOT NULL,
          "L_EXTENDEDPRICE" NUMBER NOT NULL,
          "L_DISCOUNT" NUMBER NOT NULL,
          "L_TAX" NUMBER NOT NULL,
          "L_RETURNFLAG" CHAR(1 BYTE) DEFAULT NULL,
          "L_LINESTATUS" CHAR(1 BYTE) DEFAULT NULL,
          "L_SHIPDATE" DATE NOT NULL,
          "L_COMMITDATE" DATE DEFAULT NULL,
          "L_RECEIPTDATE" DATE DEFAULT NULL,
          "L_SHIPINSTRUCT" CHAR(25 BYTE) DEFAULT NULL,
          "L_SHIPMODE" CHAR(10 BYTE) DEFAULT NULL,
          "L_COMMENT" VARCHAR2(44 BYTE) DEFAULT NULL,
          CONSTRAINT "LINEITEM_OBPK_1712656376567797" PRIMARY KEY ("L_ORDERKEY", "L_LINENUMBER")
          )
          PARTITION BY HASH ("L_ORDERKEY") PARTITIONS 9
          ;

          4. OB 4.3 数据导入方法。

          OB 很早就支持 LOAD DATA
          命令加载 CSV 文件,要求文件在 OB 集群节点上,从 4.3 开始支持客户端导入。

          服务端加载文件命令如下。服务端加载文件有个优势就是文件名使用通配符,可以导入同时多个文件。当然前提是租户内存资源足够大。

            load data /*+ parallel(12) append */ infile '/data2/tpch/s100/lineitem.tbl.*' into table lineitem fields terminated by '|';

            APPEND
             Hint 是旁路导入提示,直接在数据文件中分配空间并写入数据。APPEND
             Hint 默认等同于使用的 direct(false, 0)
            ,同时可以实现 在线收集统计信息(GATHER_OPTIMIZER_STATISTICS
             Hint)的功能。

            旁路导入跟普通的写入原理差异很大,表现形式就是普通的数据写入会消耗租户内存的MemTable ,旁路写入不用 MemTable 内存。但是旁路写入会使用 KVCache 内存。所以需要确保 KVCache 最大内存够用;否则,也可能会碰到内存不足报错。这一块 OB 4.3 后面还会优化这块逻辑,保证不报错。

              (SYS@localhost:sql.sock) [TPCH]> load data /*+ parallel(12) append */ infile '/data2/tpch/s100/customer.tbl.*' into table customer fields terminated by '|';
              ORA-00600: internal error code, arguments: -4013, No memory or reach tenant memory limit

              客户端加载文件命令需要用 LOAD DATA LOCAL INFILE
              。客户端不支持文件名用通配符,需要一个个写文件。

                load data /*+ parallel(12) append */ local infile '/data/1/tpch/s1/lineitem.tbl.1' into table lineitem fields terminated by '|';
                load data /*+ parallel(12) append */ local infile '/data/1/tpch/s1/lineitem.tbl.2' into table lineitem fields terminated by '|';

                5. 数据容量大小和压缩比。

                统计数据大小有多个途径。一是看视图 oceanbase.cdb_ob_tablet_replicasrequired_size
                data_size
                (前者包含后者)。这个字段的值在租户合并(MAJOR FREEZE
                )后更新,并且体现的是一份数据的容量大小。

                另外一个是看视图 oceanbase.GV$OB_SSTABLES 的 size
                。这个视图里是 LSM Tree 不同层次的 SSTable 数据。这个大小会随着转储和合并变化。在数据加载的过程中,同一份数据可能出现在多个版本中。所以数据的总体大小可能会超出单份数据的总大小。当数据导入结束后,两次全量合并后并过了快照保留时间,这部分大小就稳定下来了。

                  SELECT t1.tenant_id, t1.database_name, t1.table_name, t1.tablet_id, t1.partition_name, t1.svr_ip, round(t2.data_size/1024/1024/1024,1) data_size_gb, round(t2.required_size/1024/1024/1024,1) required_size_gb, t1.ls_id, s.table_type, round(s.SIZE/1024/1024/1024,2) size_gb
                  FROM oceanbase.CDB_OB_TABLE_LOCATIONS t1
                  JOIN oceanbase.cdb_ob_tablet_replicas t2 ON (t1.tablet_id=t2.tablet_id AND t1.ls_id=t2.ls_id)
                  JOIN oceanbase.GV$OB_SSTABLES s ON (t1.tenant_id=s.tenant_id AND t1.ls_id=s.ls_id AND t1.svr_ip=s.svr_ip AND t1.tablet_id=s.tablet_id)
                  WHERE t1.tenant_id=1012 AND t1.database_name='TPCH' AND t1.ROLE='LEADER' AND s.table_type <> 'MEMTABLE' AND table_name IN ('lineitem')
                  ;

                  所以,统计总体的 TPC-的容量大小 SQL 有两个方法。当导入结束后两次合并后,这两个方法结果基本一致。

                    SELECT t1.tenant_id, t1.database_name, t1.table_name, round(sum(t2.required_size)/1024/1024/1024) required_size_gb, count(*) cnt
                    FROM oceanbase.CDB_OB_TABLE_LOCATIONS t1
                    JOIN oceanbase.cdb_ob_tablet_replicas t2 ON (t1.tablet_id=t2.tablet_id AND t1.ls_id=t2.ls_id)
                    WHERE t1.tenant_id=1012 AND t1.database_name='TPCH' AND t1.ROLE='LEADER'
                    GROUP BY t1.tenant_id, t1.database_name, t1.table_name
                    ;


                    SELECT t1.tenant_id, t1.database_name, t1.table_name, round(sum(s.size)/1024/1024/1024) data_size_gb, count(*) cnt
                    FROM oceanbase.CDB_OB_TABLE_LOCATIONS t1
                    JOIN oceanbase.cdb_ob_tablet_replicas t2 ON (t1.tablet_id=t2.tablet_id AND t1.ls_id=t2.ls_id)
                    JOIN oceanbase.GV$OB_SSTABLES s ON (t1.tenant_id=s.tenant_id AND t1.ls_id=s.ls_id AND t1.svr_ip=s.svr_ip AND t1.tablet_id=s.tablet_id)
                    WHERE t1.tenant_id=1012 AND t1.database_name='TPCH' AND t1.ROLE='LEADER' AND s.table_type <> 'MEMTABLE'
                    GROUP BY t1.tenant_id, t1.database_name, t1.table_name
                    ;

                    简单计算一下数据压缩比。前面 lineitem 数据文件大小约为 74.7G ,在 OB 中表 lineitem 最终占用容量 38G 左右,压缩比约 51%。这个压缩比不算高,其原因是 TPC-H 的字符串列安全是随机字符串,真实的业务文本数据压缩比一定比这个高。


                    6. TPC-H 查询性能。

                    这里测试服务器资源都是虚拟机,租户也只有 12C12G,CPU 是超卖的,不等同于物理的CPU。此外集群和租户也没有刻意去做参数调优所以这个测试结果只是特定资源下的结果,不一定表示 OB 的全部真实能力。

                    在执行之前特别将 KVCache 清空。可以通过视图 oceanbase.gv$ob_memory 观察内存。

                      ALTER SYSTEM flush kvcache;
                      SELECT tenant_id, svr_ip, ctx_name, mod_name, count, round(HOLD/1024/1024/1024,2) hold_gb, round(used/1024/1024/1024,2) used_gb
                      FROM oceanbase.gv$ob_memory
                      WHERE tenant_id=1012
                      ORDER BY HOLD DESC LIMIT 5;

                      测试 SQL 脚本就是 TPC-H 工具针对 100G 规模数据生成的 SQL。详情如下,按顺序对应 Q1 ~ Q22 。研究这些 SQL 的执行计划有助于深入理解 OB 4.3 的 SQL 引擎在 OLAP 方面的能力。这个以后再专门分享。

                        select /*+   TPCH_Q1 PARALLEL(16) */
                        l_returnflag,
                        l_linestatus,
                        sum(l_quantity) as sum_qty,
                        sum(l_extendedprice) as sum_base_price,
                        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
                        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
                        avg(l_quantity) as avg_qty,
                        avg(l_extendedprice) as avg_price,
                        avg(l_discount) as avg_disc,
                        count(*) as count_order
                        from
                        lineitem
                        where
                        l_shipdate <= date '1998-12-01' - interval '90' day
                        group by
                        l_returnflag,
                        l_linestatus
                        order by
                        l_returnflag,
                        l_linestatus;


                        SELECT * FROM (SELECT /*+ TPCH_Q2 PARALLEL(16) */
                        s_acctbal,
                        s_name,
                        n_name,
                        p_partkey,
                        p_mfgr,
                        s_address,
                        s_phone,
                        s_comment
                        from
                        part,
                        supplier,
                        partsupp,
                        nation,
                        region
                        where p_partkey = ps_partkey and
                        s_suppkey = ps_suppkey and
                        p_size = 30 and
                        p_type like '%STEEL' and
                        s_nationkey = n_nationkey and
                        n_regionkey = r_regionkey and
                        r_name = 'ASIA' and
                        ps_supplycost = (
                        SELECT min(ps_supplycost)
                        from
                        partsupp,
                        supplier,
                        nation,
                        region
                        where
                        p_partkey = ps_partkey and
                        s_suppkey = ps_suppkey and
                        s_nationkey = n_nationkey and
                        n_regionkey = r_regionkey and r_name = 'ASIA' )
                        order by
                        s_acctbal desc,
                        n_name,
                        s_name,
                        p_partkey)
                        WHERE rownum <= 100;


                        SELECT * FROM (
                        SELECT /*+ TPCH_Q3 PARALLEL(16) */ *
                        from (
                        SELECT
                        l_orderkey,
                        o_orderdate,
                        o_shippriority,
                        sum(l_extendedprice * (1 - l_discount)) as revenue
                        from
                        customer,
                        orders,
                        lineitem
                        where
                        c_mktsegment = 'BUILDING' and
                        c_custkey = o_custkey and
                        l_orderkey = o_orderkey and
                        o_orderdate < '1995-03-15' and
                        l_shipdate > '1995-03-15'
                        group by
                        l_orderkey,
                        o_orderdate,
                        o_shippriority
                        order by
                        revenue desc,
                        o_orderdate)
                        )
                        WHERE ROWNUM <= 10;


                        SELECT /*+ TPCH_Q4 PARALLEL(16) no_unnest */
                        o_orderpriority,
                        count(*) as order_count
                        from
                        orders
                        where
                        o_orderdate >= DATE'1993-07-01' and
                        o_orderdate < DATE'1993-07-01' + interval '3' month and
                        exists (
                        SELECT *
                        from lineitem
                        where
                        l_orderkey = o_orderkey and
                        l_commitdate < l_receiptdate )
                        group by
                        o_orderpriority
                        order by
                        o_orderpriority;


                        select /*+ TPCH_Q5 PARALLEL(16) */
                        n_name,
                        sum(l_extendedprice * (1 - l_discount)) as revenue
                        from
                        customer,
                        orders,
                        lineitem,
                        supplier,
                        nation,
                        region
                        where
                        c_custkey = o_custkey
                        and l_orderkey = o_orderkey
                        and l_suppkey = s_suppkey
                        and c_nationkey = s_nationkey
                        and c_nationkey = n_nationkey
                        and n_regionkey = r_regionkey
                        and r_name = 'EUROPE'
                        and o_orderdate >= date '1995-01-01'
                        and o_orderdate < date '1995-01-01' + interval '1' year
                        group by
                        n_name
                        order by
                        revenue desc;

                        select /*+ TPCH_Q6 PARALLEL(16) */
                        sum(l_extendedprice * l_discount) as revenue
                        from
                        lineitem
                        where
                        l_shipdate >= date '1994-01-01'
                        and l_shipdate < date '1994-01-01' + interval '1' year
                        and l_discount between .06 - 0.01 and .06 + 0.01
                        and l_quantity < 24;

                        select /*+ TPCH_Q7 stream(0) PARALLEL(16) */supp_nation,
                        cust_nation,
                        l_year,
                        sum(volume) as revenue
                        from
                        (
                        select
                        n1.n_name as supp_nation,
                        n2.n_name as cust_nation,
                        extract(year from l_shipdate) as l_year,
                        l_extendedprice * (1 - l_discount) as volume
                        from
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2
                        where
                        s_suppkey = l_suppkey
                        and o_orderkey = l_orderkey
                        and c_custkey = o_custkey
                        and s_nationkey = n1.n_nationkey
                        and c_nationkey = n2.n_nationkey
                        and (
                        (n1.n_name = 'RUSSIA' and n2.n_name = 'IRAQ')
                        or (n1.n_name = 'IRAQ' and n2.n_name = 'RUSSIA')
                        )
                        and l_shipdate between date '1995-01-01' and date '1996-12-31'
                        ) shipping
                        group by
                        supp_nation,
                        cust_nation,
                        l_year
                        order by
                        supp_nation,
                        cust_nation,
                        l_year;

                        select /*+ TPCH_Q8 stream(0) PARALLEL(16) */
                        o_year,
                        sum(case
                        when nation = 'IRAQ' then volume
                        else 0
                        end) sum(volume) as mkt_share
                        from
                        (
                        select
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) as volume,
                        n2.n_name as nation
                        from
                        part,
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2,
                        region
                        where
                        p_partkey = l_partkey
                        and s_suppkey = l_suppkey
                        and l_orderkey = o_orderkey
                        and o_custkey = c_custkey
                        and c_nationkey = n1.n_nationkey
                        and n1.n_regionkey = r_regionkey
                        and r_name = 'MIDDLE EAST'
                        and s_nationkey = n2.n_nationkey
                        and o_orderdate between date '1995-01-01' and date '1996-12-31'
                        and p_type = 'MEDIUM PLATED TIN'
                        ) all_nations
                        group by
                        o_year
                        order by
                        o_year;

                        select /*+ TPCH_Q9 PARALLEL(16) */
                        nation,
                        o_year,
                        sum(amount)
                        as sum_profit from
                        (
                        select
                        n_name as nation,
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity
                        as amount
                        from
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                        where
                        s_suppkey = l_suppkey
                        and ps_suppkey = l_suppkey
                        and ps_partkey = l_partkey
                        and p_partkey = l_partkey
                        and o_orderkey = l_orderkey
                        and s_nationkey = n_nationkey
                        and p_name like '%indian%' )
                        profit
                        group by
                        nation,
                        o_year
                        order by
                        nation,
                        o_year
                        desc;


                        select */*+ TPCH_Q10 PARALLEL(16) */ from (
                        select /*+
                        PARALLEL(16)
                        */ c_custkey,
                        c_name,
                        sum(l_extendedprice * (1 - l_discount)) as revenue,
                        c_acctbal,
                        n_name,
                        c_address,
                        c_phone,
                        c_comment
                        from
                        customer,
                        orders,
                        lineitem,
                        nation
                        where
                        c_custkey = o_custkey
                        and l_orderkey = o_orderkey
                        and o_orderdate >= date '1994-07-01'
                        and o_orderdate < date '1994-07-01' + interval '3' month
                        and l_returnflag = 'R'
                        and c_nationkey = n_nationkey
                        group by
                        c_custkey,
                        c_name,
                        c_acctbal,
                        c_phone,
                        n_name,
                        c_address,
                        c_comment
                        order by
                        revenue desc
                        )
                        where rownum <= 20;


                        select /*+ TPCH_Q11 PARALLEL(16) */
                        ps_partkey,
                        sum(ps_supplycost * ps_availqty) as value
                        from
                        partsupp,
                        supplier,
                        nation
                        where
                        ps_suppkey = s_suppkey
                        and s_nationkey = n_nationkey
                        and n_name = 'GERMANY'
                        group by
                        ps_partkey having
                        sum(ps_supplycost * ps_availqty) > (
                        select
                        sum(ps_supplycost * ps_availqty) * 0.0000100000
                        from
                        partsupp,
                        supplier,
                        nation
                        where
                        ps_suppkey = s_suppkey
                        and s_nationkey = n_nationkey
                        and n_name = 'GERMANY'
                        )
                        order by
                        value desc;

                        select /*+ TPCH_Q12 PARALLEL(16) */
                        l_shipmode,
                        sum(case
                        when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                        else 0
                        end) as high_line_count,
                        sum(case
                        when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                        else 0
                        end) as low_line_count
                        from
                        orders,
                        lineitem
                        where
                        o_orderkey = l_orderkey
                        and l_shipmode in ('MAIL', 'SHIP')
                        and l_commitdate < l_receiptdate
                        and l_shipdate < l_commitdate
                        and l_receiptdate >= date '1994-01-01'
                        and l_receiptdate < date '1994-01-01' + interval '1' year
                        group by
                        l_shipmode
                        order by
                        l_shipmode;

                        select /*+ TPCH_Q14 PARALLEL(16) */
                        100.00 * sum(case
                        when p_type like 'PROMO%'
                        then l_extendedprice * (1 - l_discount)
                        else 0
                        end) sum(l_extendedprice * (1 - l_discount)) as promo_revenue
                        from
                        lineitem,
                        part
                        where
                        l_partkey = p_partkey
                        and l_shipdate >= date '1995-09-01'
                        and l_shipdate < date '1995-09-01' + interval '1' month;

                        create view revenue0 (supplier_no, total_revenue) as
                        select
                        l_suppkey,
                        sum(l_extendedprice * (1 - l_discount))
                        from
                        lineitem
                        where
                        l_shipdate >= date '1996-01-01'
                        and l_shipdate < date '1996-01-01' + interval '3' month
                        group by
                        l_suppkey;




                        select /*+ TPCH_Q15 PARALLEL(16) */
                        s_suppkey,
                        s_name,
                        s_address,
                        s_phone,
                        total_revenue
                        from
                        supplier,
                        revenue0
                        where
                        s_suppkey = supplier_no
                        and total_revenue = (
                        select
                        max(total_revenue)
                        from
                        revenue0
                        )
                        order by
                        s_suppkey;


                        drop view revenue0;

                        select /*+ TPCH_Q16 PARALLEL(16) */
                        p_brand,
                        p_type,
                        p_size,
                        count(distinct ps_suppkey) as supplier_cnt
                        from
                        partsupp,
                        part
                        where
                        p_partkey = ps_partkey
                        and p_brand <> 'Brand#45'
                        and p_type not like 'MEDIUM POLISHED%'
                        and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
                        and ps_suppkey not in (
                        select
                        s_suppkey
                        from
                        supplier
                        where
                        s_comment like '%Customer%Complaints%'
                        )
                        group by
                        p_brand,
                        p_type,
                        p_size
                        order by
                        supplier_cnt desc,
                        p_brand,
                        p_type,
                        p_size;

                        select /*+ TPCH_Q17 PARALLEL(16) */
                        sum(l_extendedprice) 7.0 as avg_yearly
                        from
                        lineitem,
                        part
                        where
                        p_partkey = l_partkey
                        and p_brand = 'Brand#23'
                        and p_container = 'MED BOX'
                        and l_quantity < (
                        select
                        0.2 * avg(l_quantity)
                        from
                        lineitem
                        where
                        l_partkey = p_partkey
                        );


                        select /*+ TPCH_Q18 stream(0) PARALLEL(16)*/
                        *
                        from(
                        select
                        c_name,
                        c_custkey,
                        o_orderkey,
                        o_orderdate,
                        o_totalprice,
                        sum(l_quantity)
                        from
                        customer,
                        orders,
                        lineitem
                        where
                        o_orderkey
                        in
                        (
                        select
                        l_orderkey
                        from
                        lineitem
                        group by
                        l_orderkey
                        having
                        sum(l_quantity) > 313 )
                        and c_custkey = o_custkey
                        and o_orderkey = l_orderkey
                        group by
                        c_name,
                        c_custkey,
                        o_orderkey,
                        o_orderdate,
                        o_totalprice
                        order by
                        o_totalprice desc,
                        o_orderdate)
                        ;



                        select /*+ TPCH_Q19 PARALLEL(16) */
                        sum(l_extendedprice* (1 - l_discount)) as revenue
                        from
                        lineitem,
                        part
                        where
                        (
                        p_partkey = l_partkey
                        and p_brand = 'Brand#12'
                        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                        and l_quantity >= 1 and l_quantity <= 1 + 10
                        and p_size between 1 and 5
                        and l_shipmode in ('AIR', 'AIR REG')
                        and l_shipinstruct = 'DELIVER IN PERSON'
                        )
                        or
                        (
                        p_partkey = l_partkey
                        and p_brand = 'Brand#23'
                        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                        and l_quantity >= 10 and l_quantity <= 10 + 10
                        and p_size between 1 and 10
                        and l_shipmode in ('AIR', 'AIR REG')
                        and l_shipinstruct = 'DELIVER IN PERSON'
                        )
                        or
                        (
                        p_partkey = l_partkey
                        and p_brand = 'Brand#34'
                        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                        and l_quantity >= 20 and l_quantity <= 20 + 10
                        and p_size between 1 and 15
                        and l_shipmode in ('AIR', 'AIR REG')
                        and l_shipinstruct = 'DELIVER IN PERSON'
                        );

                        select /*+ TPCH_Q20 PARALLEL(16) */
                        s_name,
                        s_address
                        from
                        supplier,
                        nation
                        where
                        s_suppkey in (
                        select
                        ps_suppkey
                        from
                        partsupp
                        where
                        ps_partkey in (
                        select
                        p_partkey
                        from
                        part
                        where
                        p_name like 'forest%'
                        )
                        and ps_availqty > (
                        select
                        0.5 * sum(l_quantity)
                        from
                        lineitem
                        where
                        l_partkey = ps_partkey
                        and l_suppkey = ps_suppkey
                        and l_shipdate >= date '1994-01-01'
                        and l_shipdate < date '1994-01-01' + interval '1' year
                        )
                        )
                        and s_nationkey = n_nationkey
                        and n_name = 'CANADA'
                        order by
                        s_name;


                        select * from (
                        select
                        /*+ TPCH_Q21  PARALLEL(16) */
                        s_name,
                        count(*) as numwait
                        from
                        supplier,
                        lineitem l1,
                        orders,
                        nation
                        where
                        s_suppkey = l1.l_suppkey
                        and o_orderkey = l1.l_orderkey
                        and o_orderstatus = 'F'
                        and l1.l_receiptdate > l1.l_commitdate
                        and exists (
                        select
                        *
                        from
                        lineitem l2
                        where
                        l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey
                        )
                        and not exists (
                        select
                        *
                        from
                        lineitem l3
                        where
                        l3.l_orderkey = l1.l_orderkey
                        and l3.l_suppkey <> l1.l_suppkey
                        and l3.l_receiptdate > l3.l_commitdate
                        )
                        and s_nationkey = n_nationkey
                        and n_name = 'CHINA'
                        group by
                        s_name
                        order by
                        numwait desc,
                        s_name
                        )
                        where rownum <= 100;         


                        上面运行结果后面跟列存下的结果一起对比呈现。


                        7. 在线将表修改为列存模式。

                        OB 4.2 以及以前的版本,在宏块层面数据存储呈现行存特点,在微块层面存储呈列存特点。OB 4.3 后正式推出基于 LSM-Tree 架构的列存存储引擎。具体原理以后再研究,这里先看看用法和效果。

                        将表变更为列存表。具体 SQL 如下:

                        • ADD COLUMN GROUP(all columns, each column)
                          :将表变更为行存列存冗余的表。

                        • ADD COLUMN GROUP(each column)
                          :将表变更为列存表。

                        下面先将 TPC-H 中的分区表(也是大表)变更为单存的列存表。

                          alter table part add column group (each column);
                          alter table partsupp add column group (each column);
                          alter table supplier add column group (each column);
                          alter table customer add column group (each column);
                          alter table lineitem add column group (each column);
                          alter table orders add column group (each column);


                          将大表从行存变更为列存,OB 数据文件剩余空间初步估计需要预留当前表的空间的两倍以上空间(有待核实);否则,可能会碰到空间不足报错。在集群节点空间统计上不要依赖视图 gv$ob_servers 的 data_disk_in_use  ,初步看它代表的是单份数据的容量并且可能只是在合并结束后才更新。

                            (SYS@localhost:sql.sock) [TPCH]> alter table lineitem add column group (each column);
                            ORA-00600: internal error code, arguments: -4184, Server out of disk space

                            上面 lineitem 表是最大的表,我变更过程中多次遭遇空间失败(不断的扩容数据文件)。在观察表的 SSTable 大小时,有时候能看到 3 个全量版本的大小(有频繁手动触发 major freeze 命令)。存储空间应该不是 3 份,多个相邻版本的 SSTable 空间有一定复用(这里我并没有更新实际表数据)。估计跟集群的快照保留时间 undo_retention 是 1800s 有关。

                            测试环境磁盘 IO 能力跟不上导致上面这个 DDL 执行时间非常的长。

                            最终 DDL 成功后的 TPC-H 表 lineitem 的特点如下。

                            估计在视图 GV$OB_SSTABLE
                            里,表的每个分区的每个列单独一个 SSTable 文件。

                            8. INSERT INTO SELECT 语句旁路导入。

                            由于上面在线变更实在太慢,还可以使用 INSERT INTO SELECT FROM
                            语句旁路导入数据到列存表。然后再更改表名。

                            使用限制

                            • 只支持 PDML(Parallel Data Manipulation Language,并行数据操纵语言),非 PDML 不能用旁路导入。

                            • 不能两个语句同时写一个表,因为导入的过程中会先加表锁。

                            • 不支持在触发器(Trigger)使用。

                            • 不能在多行事务(包含多个操作的事务)中运行。


                            判断是否用了旁路导入一个可以看解析执行计划(explain extended
                            )最后的 Note ,二个看租户性能监控 [Memstore]
                            监控。 

                              (SYS@localhost:sql.sock) [TPCH]> explain extended INSERT *+ append enable_parallel_dml parallel(8) */ INTO orders2 SELECT * FROM orders;                                                                          
                              +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                              | Query Plan |
                              +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                              | ================================================================================== |
                              | |ID|OPERATOR |NAME |EST.ROWS |EST.TIME(us)| |
                              | ---------------------------------------------------------------------------------- |
                              | |0 |PX COORDINATOR | |150000000|1436592693 | |
                              | |1 |+-EXCHANGE OUT DISTR |:EX10001 |150000000|1149741363 | |
                              | |2 | +-INSERT | |150000000|1069069322 | |
                              | |3 | +-EXCHANGE IN DISTR | |150000000|119166652 | |
                              | |4 | +-EXCHANGE OUT DISTR (PKEY HASH)|:EX10000 |150000000|83310236 | |
                              | |5 | +-SUBPLAN SCAN |ANONYMOUS_VIEW1|150000000|2638194 | |
                              | |6 | +-PX BLOCK ITERATOR | |150000000|2588564 | |
                              | |7 | +-TABLE FULL SCAN |ORDERS |150000000|2588564 | |
                              | ================================================================================== |
                              | Outputs & filters: |
                              | ------------------------------------- |


                              <......>
                              Used Hint: |
                              | ------------------------------------- |
                              | *+ |
                              | |
                              | USE_PLAN_CACHE( NONE ) |
                              | PARALLEL(8) |
                              | ENABLE_PARALLEL_DML |
                              | APPEND |
                              | APPEND |
                              | */


                              <......>


                              | Plan Type: |
                              | DISTRIBUTED |
                              | Note: |
                              | Degree of Parallelism is 8 because of hint |
                              | Direct-mode is enabled in insert into select |
                              +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                              100 rows in set (0.047 sec)


                              (SYS@localhost:sql.sock) [TPCH]> INSERT /*+ append enable_parallel_dml parallel(8) */ INTO orders2 SELECT * FROM orders;
                              Query OK, 150000000 rows affected (16 min 52.900 sec)
                              Records: 150000000 Duplicates: 0 Warnings: 0

                              还是 INSERT INTO SELECT FROM
                                这样转换到列存表数据快。

                              TPC-H 所有表空间两种统计结果如下。

                              这里面的 cnt
                              是表的分区数量。

                              这里面的 cnt
                              是表的 sstable 数量。看统计的空间大概是前者的两倍。不过再多合并几次等一段时间,这个结果就跟前面查询差不多了。

                              上图中 orders3 是老的行存表,orders 是列存表。可以看到容量差异不大,sstable 数量是10 倍的差异,orders 表正好有 9 列。sstable 里 9 个 normal_cg
                              加 1 个 co_major
                              就是 10 个 sstable。

                              9. 行存和列存对比结论

                              首先看 TPC-H 表的容量对比。

                              初步看表容量不大的时候,从行存到列存空间变化不大。表越大,空间缩减效果越明显。最大的表 lineitem 容量从 38G 降到 21G 算是很明显了。这也是行存的一个优势。

                              再看看 TPC-H 的 22 条查询语句的性能对比。

                              可以看到 22 个查询的执行时间都明显减少,总时间减少 78% 。列存下 TPC-H 的复杂查询性能提升非常明显!

                              22 个查询中 Q2 查询提升最明显,看看它在列存结构下的执行计划。

                                ===========================================================================
                                |ID|OPERATOR |NAME |EST.ROWS |EST.TIME(us)|
                                ---------------------------------------------------------------------------
                                |0 |PX COORDINATOR MERGE SORT | |5 |9829316 |
                                |1 |└─EXCHANGE OUT DISTR |:EX10001|5 |9829305 |
                                |2 | └─SORT | |5 |9829304 |
                                |3 | └─HASH GROUP BY | |5 |9829304 |
                                |4 | └─EXCHANGE IN DISTR | |68 |9829303 |
                                |5 | └─EXCHANGE OUT DISTR (HASH) |:EX10000|68 |9829293 |
                                |6 | └─HASH GROUP BY | |68 |9829270 |
                                |7 | └─PX BLOCK ITERATOR | |578901265|1492064 |
                                |8 | └─COLUMN TABLE FULL SCAN|LINEITEM|578901265|1492064 |
                                ===========================================================================
                                Outputs & filters:
                                -------------------------------------
                                0 - output([INTERNAL_FUNCTION(LINEITEM.L_RETURNFLAG, LINEITEM.L_LINESTATUS, T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)),
                                T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,
                                NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY)),
                                T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT)),
                                T_FUN_COUNT_SUM(T_FUN_COUNT(*)))]), filter(nil), rowset=256
                                sort_keys([LINEITEM.L_RETURNFLAG, ASC], [LINEITEM.L_LINESTATUS, ASC])
                                1 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [INTERNAL_FUNCTION(LINEITEM.L_RETURNFLAG, LINEITEM.L_LINESTATUS, T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)),
                                T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE
                                * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY)),
                                T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)), T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT)) / T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT)),
                                T_FUN_COUNT_SUM(T_FUN_COUNT(*)))]), filter(nil), rowset=256
                                dop=16
                                2 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE))],
                                [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,
                                NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY))],
                                [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT))]), filter(nil), rowset=256
                                sort_keys([LINEITEM.L_RETURNFLAG, ASC], [LINEITEM.L_LINESTATUS, ASC])
                                3 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE))],
                                [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,
                                NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY))],
                                [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT))]), filter(nil), rowset=256
                                group([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS]), agg_func([T_FUN_SUM(T_FUN_SUM(LINEITEM.L_QUANTITY))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE))],
                                [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1,
                                NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1, -85)) + LINEITEM.L_TAX))], [T_FUN_COUNT_SUM(T_FUN_COUNT(*))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_QUANTITY))],
                                [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE))], [T_FUN_SUM(T_FUN_SUM(LINEITEM.L_DISCOUNT))], [T_FUN_COUNT_SUM(T_FUN_COUNT(LINEITEM.L_DISCOUNT))])
                                4 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE
                                * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,
                                -85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],
                                [T_FUN_COUNT(LINEITEM.L_DISCOUNT)]), filter(nil), rowset=256
                                5 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE
                                * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,
                                -85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],
                                [T_FUN_COUNT(LINEITEM.L_DISCOUNT)]), filter(nil), rowset=256
                                (#keys=2, [LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS]), dop=16
                                6 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE
                                * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,
                                -85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],
                                [T_FUN_COUNT(LINEITEM.L_DISCOUNT)]), filter(nil), rowset=256
                                group([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS]), agg_func([T_FUN_SUM(LINEITEM.L_QUANTITY)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE
                                * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT)], [T_FUN_SUM(LINEITEM.L_EXTENDEDPRICE * cast(1, NUMBER(-1, -85)) - LINEITEM.L_DISCOUNT * cast(1, NUMBER(-1,
                                -85)) + LINEITEM.L_TAX)], [T_FUN_COUNT(*)], [T_FUN_COUNT(LINEITEM.L_QUANTITY)], [T_FUN_COUNT(LINEITEM.L_EXTENDEDPRICE)], [T_FUN_SUM(LINEITEM.L_DISCOUNT)],
                                [T_FUN_COUNT(LINEITEM.L_DISCOUNT)])
                                7 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [LINEITEM.L_QUANTITY], [LINEITEM.L_EXTENDEDPRICE], [LINEITEM.L_DISCOUNT], [LINEITEM.L_TAX]), filter(nil), rowset=256
                                8 - output([LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [LINEITEM.L_QUANTITY], [LINEITEM.L_EXTENDEDPRICE], [LINEITEM.L_DISCOUNT], [LINEITEM.L_TAX]), filter([LINEITEM.L_SHIPDATE
                                <= '1998-12-01 00:00:00' - INTERVAL '+90 00:00:00' DAY (2) TO SECOND (0)]), rowset=256
                                access([LINEITEM.L_SHIPDATE], [LINEITEM.L_RETURNFLAG], [LINEITEM.L_LINESTATUS], [LINEITEM.L_QUANTITY], [LINEITEM.L_EXTENDEDPRICE], [LINEITEM.L_DISCOUNT],
                                [LINEITEM.L_TAX]), partitions(p[0-15])
                                is_index_back=false, is_global_index=false, filter_before_indexback[false],
                                range_key([LINEITEM.L_ORDERKEY], [LINEITEM.L_LINENUMBER]), range(MIN,MIN ; MAX,MAX)always true

                                行存结构下执行计划如下

                                可以看到二者的区别就是针对大表 lineitem 的扫描方式 从 TABLE FULL SCAN
                                变为 COLUMN TABLE FULL SCAN
                                。这个表是最大的表,且 Q2 只查询这个表,也难怪性能提升最明显。


                                10. 结论

                                列存存储模型对 OLAP 场景复杂查询的性能帮助非常大,OB 4.3 版本完全支持列存以及行列混合 极大的提升了 OB 在 OLAP 场景的竞争力。后面还可以做 TPC-DS 场景的对比验证。加上原本 OB 在 OLTP 场景的出色能力,一套数据源(表)同时满足 OLTP 和 OLAP 场景的需求,对不同规模的企业用户都有很大吸引力。

                                不过列存模型也是刚推出,从行存在线转列存的时候,在内存和空间方面还需要注意,预留充分的资源。特别是大表在线转换的时候,尽量避免去触发合并,以及缩短快照保留时间(undo_retention
                                ),以避免数据文件空间爆掉。OB 在列存的变更这块应该还有很大的优化空间。


                                更多阅读:


                                文章转载自数据库技术闲谈,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                评论