6.5 如何运行 TPC-H 测试
TPC-H 简介
TPC-H 基准测试是由 TPC-D(由 TPC 组织于 1994 年制定的标准,用于决策支持系统方面的测试基准)发展而来的。TPC-H 用 3NF 实现了一个数据仓库,共包含 8 个基本关系。TPC-H 的主要评价指标是各个查询的响应时间,即从提交查询到结果返回所需时间。
TPC-H 基准测试的度量单位是每小时执行的查询数(QphH@size),其中 H 表示每小时系统执行复杂查询的平均次数,size 表示数据库规模的大小,它能够反映出系统处理查询的能力。
TPC-H 是根据真实的生产运行环境来建模的,这使得它可以评估一些其他测试所不能评估的关键性能参数。
总而言之,TPC 组织颁布的 TPC-H 标准满足了数据仓库领域的测试需求,并且促使各个厂商以及研究机构将该项技术推向极限。详细信息请参考 http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.3.pdf。
测试准备
软件安装
下载软件
软件由 TPC-H Tool 官方提供,下载地址为:http://tpc.org/TPC_Documents_Current_Versions/download_programs/tools-download-request5.asp?bm_type=TPC-H&bm_vers=3.0.0&mode=CURRENT-ONLY。
文件名:
TPC-H_Tools_v3.0.0.zipcd /root/TPC-H_Tools_v3.0.0/dbgen cp makefile.suite Makefile
修改
Makefile文件中的CC、DATABASE、MACHINE和WORKLOAD等参数定义。vim Makefile +103 CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE= MYSQL MACHINE = LINUX WORKLOAD = TPCH
修改
tpcd.h文件,并添加新的宏定义。vim tpcd.h +$ #ifdef MYSQL #define GEN_QUERY_PLAN "" #define START_TRAN "START TRANSACTION" #define END_TRAN "COMMIT" #define SET_OUTPUT "" #define SET_ROWCOUNT "limit %d;\n" #define SET_DBASE "use %s;\n" #endif
对文件进行编译。
make
编译完成后该目录下会生成以下两个可执行文件:
dbgen:数据生成工具。在使用 InfiniDB 官方测试脚本进行测试时,需要用该工具生成 tpch 相关表数据。
qgen:SQL 生成工具。生成初始化测试查询,由于不同的 seed 生成的查询不同,为了结果的可重复性,请使用附件提供的 22 个查询。
数据文件准备
dbgen 命令可以生成指定大小的数据,生成环境测试建议不少于 1000G 。本文以 10G 为例。
./dbgen -s 10 输出: TPC-H Population Generator (Version 3.0.0) Copyright Transaction Processing Performance Council 1994 - 2010 mkdir tpch10 mv *.tbl tpch10/ [root@obce-0000 dbgen]# ls -lrth tpch10/ 总用量 11G -rw-r--r-- 1 root root 14M 10月 1 16:04 supplier.tbl -rw-r--r-- 1 root root 389 10月 1 16:04 region.tbl -rw-r--r-- 1 root root 233M 10月 1 16:04 part.tbl -rw-r--r-- 1 root root 1.2G 10月 1 16:04 partsupp.tbl -rw-r--r-- 1 root root 1.7G 10月 1 16:04 orders.tbl -rw-r--r-- 1 root root 2.2K 10月 1 16:04 nation.tbl -rw-r--r-- 1 root root 7.3G 10月 1 16:04 lineitem.tbl -rw-r--r-- 1 root root 234M 10月 1 16:04 customer.tbl
查询语句准备
cp qgen queries/ cp dists.dss queries/ cd queries/ for i in `seq 22`; do echo $i; ./qgen -d $i -s 10 > db"$i".sql; done dos2unix *.sql
去掉生成的 SQL 文件中的 limit -xx(OceanBase 不支持 limit 负数的语法),去掉 day 后面的 (3),并且加上 parallel(96) 并发,以 db1.sql 为例。
SQL 1
select /*+ TPCH_Q1 parallel(96) */ ---增加 parallel 并发执行 l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day (3) ---去掉(3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; limit -1; --- 去掉这行SQL 2
select /*+ TPCH_Q2 parallel(96) */ s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey;
其他 SQL 不再重复说明。
参数准备
租户参数
set global autocommit=ON; set global ob_query_timeout=36000000000; set global ob_trx_timeout=36000000000; set global max_allowed_packet=67108864; set global ob_sql_work_area_percentage=80; /* parallel_max_servers 和 parallel_servers_target 的值 推荐设置为测试租户分配的 resource unit cpu 数的 10 倍 如测试租户使用的 Unit 配置为:create resource unit $unit_name max_cpu 26 那么该值设置为 260 */ set global parallel_max_servers=260; set global parallel_servers_target=260;
建表
create tablegroup tpch_tg_10g_lineitem_order_group binding true partition by key 1 partitions 9;
create tablegroup tpch_tg_10g_partsupp_part binding true partition by key 1 partitions 9;
drop database if exists tpch_10g_part;
create database tpch_10g_part;
use tpch_10g_part;
CREATE TABLE lineitem (
l_orderkey bigint NOT NULL,
l_partkey bigint NOT NULL,
l_suppkey bigint NOT NULL,
l_linenumber bigint NOT NULL,
l_quantity bigint NOT NULL,
l_extendedprice bigint NOT NULL,
l_discount bigint NOT NULL,
l_tax bigint NOT NULL,
l_returnflag char(1) DEFAULT NULL,
l_linestatus char(1) DEFAULT NULL,
l_shipdate date NOT NULL,
l_commitdate date DEFAULT NULL,
l_receiptdate date DEFAULT NULL,
l_shipinstruct char(25) DEFAULT NULL,
l_shipmode char(10) DEFAULT NULL,
l_comment varchar(44) DEFAULT NULL,
primary key(l_orderkey, l_linenumber)
) tablegroup = tpch_tg_10g_lineitem_order_group
partition by key (l_orderkey) partitions 9;
create index I_L_ORDERKEY on lineitem(l_orderkey) local;
create index I_L_SHIPDATE on lineitem(l_shipdate) local;
CREATE TABLE orders (
o_orderkey bigint NOT NULL,
o_custkey bigint NOT NULL,
o_orderstatus char(1) DEFAULT NULL,
o_totalprice bigint DEFAULT NULL,
o_orderdate date NOT NULL,
o_orderpriority char(15) DEFAULT NULL,
o_clerk char(15) DEFAULT NULL,
o_shippriority bigint DEFAULT NULL,
o_comment varchar(79) DEFAULT NULL,
PRIMARY KEY (o_orderkey))
tablegroup = tpch_tg_10g_lineitem_order_group
partition by key(o_orderkey) partitions 9;
create index I_O_ORDERDATE on orders(o_orderdate) local;
CREATE TABLE partsupp (
ps_partkey bigint NOT NULL,
ps_suppkey bigint NOT NULL,
ps_availqty bigint DEFAULT NULL,
ps_supplycost bigint DEFAULT NULL,
ps_comment varchar(199) DEFAULT NULL,
PRIMARY KEY (ps_partkey, ps_suppkey))
tablegroup tpch_tg_10g_partsupp_part
partition by key(ps_partkey) partitions 9;
CREATE TABLE part (
p_partkey bigint NOT NULL,
p_name varchar(55) DEFAULT NULL,
p_mfgr char(25) DEFAULT NULL,
p_brand char(10) DEFAULT NULL,
p_type varchar(25) DEFAULT NULL,
p_size bigint DEFAULT NULL,
p_container char(10) DEFAULT NULL,
p_retailprice bigint DEFAULT NULL,
p_comment varchar(23) DEFAULT NULL,
PRIMARY KEY (p_partkey))
tablegroup tpch_tg_10g_partsupp_part
partition by key(p_partkey) partitions 9;
CREATE TABLE customer (
c_custkey bigint NOT NULL,
c_name varchar(25) DEFAULT NULL,
c_address varchar(40) DEFAULT NULL,
c_nationkey bigint DEFAULT NULL,
c_phone char(15) DEFAULT NULL,
c_acctbal bigint DEFAULT NULL,
c_mktsegment char(10) DEFAULT NULL,
c_comment varchar(117) DEFAULT NULL,
PRIMARY KEY (c_custkey))
partition by key(c_custkey) partitions 9;
CREATE TABLE supplier (
s_suppkey bigint NOT NULL,
s_name char(25) DEFAULT NULL,
s_address varchar(40) DEFAULT NULL,
s_nationkey bigint DEFAULT NULL,
s_phone char(15) DEFAULT NULL,
s_acctbal bigint DEFAULT NULL,
s_comment varchar(101) DEFAULT NULL,
PRIMARY KEY (s_suppkey)
) partition by key(s_suppkey) partitions 9;
CREATE TABLE nation (
n_nationkey bigint NOT NULL,
n_name char(25) DEFAULT NULL,
n_regionkey bigint DEFAULT NULL,
n_comment varchar(152) DEFAULT NULL,
PRIMARY KEY (n_nationkey));
CREATE TABLE region (
r_regionkey bigint NOT NULL,
r_name char(25) DEFAULT NULL,
r_comment varchar(152) DEFAULT NULL,
PRIMARY KEY (r_regionkey));创建结束后查看表和表分组。
MySQL [tpch_10g_part]> show tables; +-------------------------+ | Tables_in_tpch_10g_part | +-------------------------+ | customer | | lineitem | | nation | | orders | | part | | partsupp | | region | | supplier | +-------------------------+ 8 rows in set (0.365 sec) MySQL [tpch_10g_part]> show tablegroups; +----------------------------------+------------------+---------------+ | Tablegroup_name | Table_name | Database_name | +----------------------------------+------------------+---------------+ | oceanbase | NULL | NULL | | tpch_tg_10g_lineitem_order_group | lineitem | tpch_10g_part | | tpch_tg_10g_lineitem_order_group | orders | tpch_10g_part | | tpch_tg_10g_partsupp_part | part | tpch_10g_part | | tpch_tg_10g_partsupp_part | partsupp | tpch_10g_part | +----------------------------------+------------------+---------------+ 5 rows in set (0.068 sec)
加载数据
您可使用 OceanBase 数据库自带的 LOAD 命令逐个加载数据,也可以编写一个脚本批量加载数据,如下所示。
创建加载脚本目录
cd /root/TPC-H_Tools_v3.0.0/dbgen mkdir load cd load cp ../dss.ri ../dss.ddl ./
编写加载脚本
#/usr/bin/evn python #-*- encoding:utf-8 -*- import os import sys import time import commands hostname='172.xx.xxx.52' #注意!!请填写某个 observer 所在服务器的 ip 地址 port='2881' #端口号 tenant='test' #组户名 user='tpch' #用户名 password='******' #密码 data_path='/tmp/tpch10' #注意!!请填写某个 observer 所在服务器下 tbl 所在目录 db_name='tpch_10g_part' #数据库名 cmd_str='mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "show tables;" '%(hostname,port,user,tenant,db_name) print cmd_str result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) print cmd_str result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path) result = commands.getstatusoutput(cmd_str) print result cmd_str=""" mysql --default-auth=mysql_native_password,db_name -h%s -P%s -u%s@%s -c -A %s -e "load data /*+ parallel(8) */ infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,tenant,db_name,data_path)
export MYSQL_PWD=****** python load.py
性能调优
转储与合并
| 对比项 | 合并转储前 | 合并后 | 提升率 |
|---|---|---|---|
| tpch_10g(总用时) | 57.26 | 23.28 | 59.30% |
| Q1 | 2.43s | 2.02s | |
| Q2 | 0.86s | 0.32s | |
| Q3 | 2.08s | 1.49s | |
| Q4 | 0.45s | 0.34s | |
| Q5 | 0.96s | 1.58s | |
| Q6 | 1.29s | 0.83s | |
| Q7 | 2.63s | 1.41s | |
| Q8 | 3.60s | 1.39s | |
| Q9 | 1.53s | 1.85s | |
| Q10 | 4.23s | 2.38s | |
| Q11 | 0.59s | 0.47s | |
| Q12 | 2.01s | 1.00s | |
| Q13 | 2.34s | 0.85s | |
| Q14 | 0.51s | 0.21s | |
| Q15 | 1.09s | 0.57s | |
| Q16 | 1.18s | 0.41s | |
| Q17 | 10.54s | 0.94s | |
| Q18 | 1.94s | 0.98s | |
| Q19 | 2.32s | 1.02s | |
| Q20 | 10.46s | 1.25s | |
| Q21 | 2.49s | 1.42s | |
| Q22 | 1.73s | 0.55s |
调整 PRIMARY_ZONE
| 对比项 | 集中式部署(zone1) | random 部署 |
|---|---|---|
| tpch | 590.42 | 149.13 |
| Q1 | 50.74s | 15.14s |
| Q2 | 2.25s | 0.47s |
| Q3 | 37.69s | 13.42s |
| Q4 | 42.78s | 2.57s |
| Q5 | 6.94s | 2.52s |
| Q6 | 26.62s | 7.38s |
| Q7 | 0.35s | 0.23s |
| Q8 | 67.92s | 11.69s |
| Q9 | 20.19s | 6.17s |
| Q10 | 54.79s | 24.00s |
| Q11 | 0.39s | 0.19s |
| Q12 | 40.71s | 8.60s |
| Q13 | 20.99s | 7.82s |
| Q14 | 22.99s | 1.52s |
| Q15 | 46.28s | 2.95s |
| Q16 | 7.41s | 2.88s |
| Q17 | 32.04s | 8.64s |
| Q18 | 26.64s | 8.29s |
| Q19 | 31.25s | 9.42s |
| Q20 | 38.14s | 10.74s |
| Q21 | 0.24s | 0.15s |
| Q22 | 13.07s | 4.34s |
使用 TABLEGROUP
| 对比项 | 未使用 pg | 使用 pg |
|---|---|---|
| tpch | 25.3 | 23.28 |
| Q1 | 2.23s | 2.02s |
| Q2 | 0.59s | 0.32s |
| Q3 | 1.59s | 1.49s |
| Q4 | 0.46s | 0.34s |
| Q5 | 1.65s | 1.58s |
| Q6 | 0.80s | 0.83s |
| Q7 | 1.58s | 1.41s |
| Q8 | 1.52s | 1.39s |
| Q9 | 1.83s | 1.85s |
| Q10 | 2.41s | 2.38s |
| Q11 | 0.54s | 0.47s |
| Q12 | 1.14s | 1.00s |
| Q13 | 0.93s | 0.85s |
| Q14 | 0.23s | 0.21s |
| Q15 | 0.63s | 0.57s |
| Q16 | 0.46s | 0.41s |
| Q17 | 0.98s | 0.94s |
| Q18 | 1.18s | 0.98s |
| Q19 | 1.03s | 1.02s |
| Q20 | 1.34s | 1.25s |
| Q21 | 1.55s | 1.42s |
| Q22 | 0.63s | 0.55s |




