DorisDB、TiDB/TiFlash、ClickHouse性能对比-单表监控分析场景
--2021-02-14 刘春雷 (更新)
随着数据库集群数的快速上涨,监控的数据分析变得越发的重要,例如:
快速分析指定集群、指定时间段的监控值的情况
快速分析所有集群、指定时间的分类监控Top10,例如最近1天写Top10、读Top10
对于以上需求,需要分析型的数据库来支持~例如目前业界最火的3款:
DorisDB
ClickHouse
TiDB/TiFlash
当前监控数据已经落地在TiDB数据库,条数8亿+,本次只对这3种数据库进行性能测试,同时涉及:
DorisDB集群,并行度/桶对执行SQL的影响
ClickHouse单实例、集群对SQL执行的影响
TiDB 版本、SSD磁盘与 闪存卡磁盘对SQL执行的影响
因接触时间不久,可能有不正确的地方,大家参考着看~
1、汇总
1.1、结论
DorisDB总体时间最短
TiDB/TiFlash总体时间最长
TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长
TiDB多数走TiFlash更快,但4.0.10 版本的执行计划多数不走
TiDB闪存卡比SSD更快,但4.0.10有些执行计划不是最优,导致效率低
DorisDB最快次数最多
DorisDB的并行度及桶数量会较大影响SQL执行时间,越高越快
ClickHouse集群 比 单实例 SQL执行时间快
1.2、SQL执行情况

2、信息
2.1、数据库信息

数据库均为默认配置,无特殊参数调整,可能会导致相关测试没有达到性能极致
2.2、测试表信息
表名:mysql_status
内容:存的为监控信息
数据量:872208435 (8.7亿)
3、DorisDB准备数据
3.1、DorisDB建表
CREATE TABLE `mysql_status` (
`cluster_id` varchar(50) NULL COMMENT "",
`inip` varchar(50) NULL COMMENT "",
`monitorTime` datetime NULL COMMENT "",
`id` bigint(20) NULL COMMENT "",
`port` int(11) NULL COMMENT "",
`iRole` varchar(20) NOT NULL DEFAULT "OTHER" COMMENT "",
`insert_ok` int(11) NOT NULL COMMENT "",
`alive` int(11) NOT NULL COMMENT "",
`Binlog_count` int(11) NOT NULL COMMENT "",
`Binlog_total_size` bigint(20) NOT NULL COMMENT "",
`Bytes_received` bigint(20) NOT NULL COMMENT "",
`Bytes_sent` bigint(20) NOT NULL COMMENT "",
`Com_delete` bigint(20) NOT NULL COMMENT "",
`Com_insert` bigint(20) NOT NULL COMMENT "",
`Com_replace` bigint(20) NOT NULL COMMENT "",
`Com_select` bigint(20) NOT NULL COMMENT "",
`Com_update` bigint(20) NOT NULL COMMENT "",
`Queries` bigint(20) NOT NULL COMMENT "",
`Questions` bigint(20) NOT NULL COMMENT "",
`Slow_queries` bigint(20) NOT NULL COMMENT "",
`Created_tmp_disk_tables` bigint(20) NOT NULL COMMENT "",
`Threads_cached` int(11) NOT NULL COMMENT "",
`Threads_connected` int(11) NOT NULL COMMENT "",
`Threads_created` int(11) NOT NULL COMMENT "",
`Threads_running` int(11) NOT NULL COMMENT "",
`Uptime` int(11) NOT NULL COMMENT "",
`createTime` datetime NOT NULL COMMENT "数据写入时间",
`monitorDay` date NOT NULL COMMENT "监控数据获取日期"
) ENGINE=OLAP
DUPLICATE KEY(`cluster_id`, `inip`, `monitorTime`)
COMMENT "OLAP"
PARTITION BY RANGE(`monitorDay`)
(PARTITION p1 VALUES [('0000-01-01'), ('2020-01-31')),
PARTITION p2 VALUES [('2020-01-31'), ('2020-02-29')),
PARTITION p3 VALUES [('2020-02-29'), ('2020-03-31')),
PARTITION p4 VALUES [('2020-03-31'), ('2020-04-30')),
PARTITION p5 VALUES [('2020-04-30'), ('2020-05-31')),
PARTITION p6 VALUES [('2020-05-31'), ('2020-06-30')),
PARTITION p7 VALUES [('2020-06-30'), ('2020-07-31')),
PARTITION p8 VALUES [('2020-07-31'), ('2020-08-31')),
PARTITION p9 VALUES [('2020-08-31'), ('2020-09-30')),
PARTITION p10 VALUES [('2020-09-30'), ('2020-10-31')),
PARTITION p11 VALUES [('2020-10-31'), ('2020-11-30')),
PARTITION p12 VALUES [('2020-11-30'), ('2020-12-31')),
PARTITION p13 VALUES [('2020-12-31'), ('2021-01-31')),
PARTITION p14 VALUES [('2021-01-31'), ('2021-02-28')),
PARTITION p15 VALUES [('2021-02-28'), ('2021-03-31')),
PARTITION p16 VALUES [('2021-03-31'), ('2021-04-30')),
PARTITION p17 VALUES [('2021-04-30'), ('2021-05-31')))
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
3.2、DorisDB导入数据
数据从 TiDB集群SSD 导出的监控数据,灌入
【dumping导出csv】:
./dumpling -uxxx -pxxx -hxxx -Pxxx -o dumping/ --filetype csv --csv-delimiter '' --no-header -F 256MiB --threads 4 --sql 'select cluster_id,inip,monitorTime,id,port,iRole,insert_ok,alive,Binlog_count,Binlog_total_size,Bytes_received,Bytes_sent,Com_delete,Com_insert,Com_replace,Com_select,Com_update,Queries,Questions,Slow_queries,Created_tmp_disk_tables,Threads_cached,Threads_connected,Threads_created,Threads_running,Uptime,createTime,monitorDay from mysql_status'
【导入】:
for i in {000..208};do echo "-----------$i-----------";curl --location-trusted -u root:xxx -T result.000000$i.csv -H "label:r$i" -H "column_separator:," http://127.0.0.1:xxx/api/lcl/mysql_status/_stream_load;done
数据量:872208435 (8.7亿)
速度举例:
{
"TxnId": 569,
"Label": "r166",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1781588,
"NumberLoadedRows": 1781588,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 268435506,
"LoadTimeMs": 6951,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 1,
"ReadDataTimeMs": 4935,
"WriteDataTimeMs": 6938,
"CommitAndPublishTimeMs": 11
4、TiDB准备数据
4.1、建表
SSD 与 闪存卡 一样
CREATE TABLE `mysql_status` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`cluster_id` varchar(50) NOT NULL ,
`inip` varchar(50) NOT NULL ,
`port` int(11) NOT NULL ,
`iRole` varchar(20) NOT NULL,
`insert_ok` int(11) NOT NULL DEFAULT '0',
`alive` int(11) NOT NULL DEFAULT '0',
`Binlog_count` int(11) NOT NULL DEFAULT '0',
`Binlog_total_size` bigint(20) NOT NULL DEFAULT '0',
`Bytes_received` bigint(20) NOT NULL DEFAULT '0',
`Bytes_sent` bigint(20) NOT NULL DEFAULT '0',
`Com_delete` bigint(20) NOT NULL DEFAULT '0',
`Com_insert` bigint(20) NOT NULL DEFAULT '0',
`Com_replace` bigint(20) NOT NULL DEFAULT '0',
`Com_select` bigint(20) NOT NULL DEFAULT '0',
`Com_update` bigint(20) NOT NULL DEFAULT '0',
`Queries` bigint(20) NOT NULL DEFAULT '0',
`Questions` bigint(20) NOT NULL DEFAULT '0',
`Slow_queries` bigint(20) NOT NULL DEFAULT '0',
`Created_tmp_disk_tables` bigint(20) NOT NULL DEFAULT '0',
`Threads_cached` int(11) NOT NULL DEFAULT '0',
`Threads_connected` int(11) NOT NULL DEFAULT '0',
`Threads_created` int(11) NOT NULL DEFAULT '0',
`Threads_running` int(11) NOT NULL DEFAULT '0',
`Uptime` int(11) NOT NULL DEFAULT '0',
`monitorTime` datetime NOT NULL ,
`monitorDay` date NOT NULL ,
`createTime` datetime NOT NULL ,
PRIMARY KEY (`id`),
KEY `idx_inIP_port` (`inip`,`port`,`monitorTime`),
KEY `idx_clusterid` (`cluster_id`,`monitorTime`),
KEY `idx_monitorTime` (`monitorTime`),
KEY `idx_timeclusterinip` (`monitorTime`,`cluster_id`,`inip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=872252609 COMMENT='xxx'
4.2、导入数据
【导出:复用上面导出csv数据】:
【导入:load 方式导入数据】:
for i in {000..114};do echo "----------$i-------------";mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e "LOAD DATA LOCAL INFILE 'result.000000$i.csv' INTO TABLE mysql_status FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (cluster_id,inip,monitorTime,id,port,iRole,insert_ok,alive,Binlog_count,Binlog_total_size,Bytes_received,Bytes_sent,Com_delete,Com_insert,Com_replace,Com_select,Com_update,Queries,Questions,Slow_queries,Created_tmp_disk_tables,Threads_cached,Threads_connected,Threads_created,Threads_running,Uptime,createTime,monitorDay)" ;done
闪存卡集群此表数据量:872208435
SSD集群此表数据量:884370397
【此表添加至TiFlash】:
alter table mysql_status SET TIFLASH REPLICA 1;
5、CK准备数据
5.1、单实例
clickhouse-client --user=default --host=127.0.0.1
CREATE TABLE mysql_status ENGINE = MergeTree order by id AS SELECT * FROM mysql('xxx:xx', 'xxx', 'mysql_status', 'xxx', 'mima');
Progress: 120.00 million rows, 24.24 GB (261.34 thousand rows/s., 52.80 MB/s.)
5.2、集群
按照主键id取余,分成3份数据,平均落在3个节点上
【选择其中一个分片同步所有数据】
clickhouse-client --user=default --host=127.0.0.1
CREATE TABLE mysql_status ENGINE = MergeTree order by id AS SELECT * FROM mysql('I0.0.0.1:xxx', 'db_name', 'mysql_status', 'xxx', 'mima');
Progress: 120.00 million rows, 24.24 GB (261.34 thousand rows/s., 52.80 MB/s.)
【3个均创建分布式表】:
CREATE TABLE mysql_status_all AS mysql_status ENGINE = Distributed(perftest_3shards_1replicas, default, mysql_status, rand());
【3个均创建shard表】:
CREATE TABLE default.mysql_status_shard (`id` Int64, ` cluster_id ` String, `inip` String, `port` Int32, `iRole` String, `insert_ok` Int32, `alive` Int32, `Binlog_count` Int32, `Binlog_total_size` Int64, `Bytes_received` Int64, `Bytes_sent` Int64, `Com_delete` Int64, `Com_insert` Int64, `Com_replace` Int64, `Com_select` Int64, `Com_update` Int64, `Queries` Int64, `Questions` Int64, `Slow_queries` Int64, `Created_tmp_disk_tables` Int64, `Threads_cached` Int32, `Threads_connected` Int32, `Threads_created` Int32, `Threads_running` Int32, `Uptime` Int32, `monitorTime` DateTime, `monitorDay` Date, `createTime` DateTime) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192
【导入数据】:
第1台台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=0;
第2台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=1;
第3台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=2;
【3个均创建最终分布式表】:
CREATE TABLE mysql_status_shard_all AS mysql_status_shard ENGINE = Distributed(perftest_3shards_1replicas, default, mysql_status_shard, rand());
【3个分片每个本地表的条数】:
select count(*) from mysql_status_shard;

6、SQL测试对比
注:绿色的为最快的
ClickHouse集群查询表:mysql_status_shard_all ,其他均查询表:mysql_status







