暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

ClickHouse与 Elasticsearch聚合性能对比测试

大数据猫 2021-06-07
2747
    原文链接🔗:https://www.jianshu.com/p/0defb5471bc6

    Elasticsearch以其优秀的分布式架构与全文搜索引擎等特点在机器数据的存储、分析领域广为使用,但随着数据量的增长,其聚合分析性能已无法满足业务需求。而ClickHouse作为一个高性能的OLAP列式数据库管理系统有望解决这一痛点。

    本文是对ClickHouse与Elasticsearch聚合性能的简单对比测试。主要关注查询语句的响应时间,暂不考虑资源占用情况。

    组件版本CPU内存
    ClickHouse7.9.04C8G
    Elasticsearch20.11.4.134C8G

    使用ClickHouse官方提供的测试数据集,共67G,约6亿行。

    其中,ClickHouse使用LO_ORDERDATE字段作为分区键,使用LO_ORDERDATE, LO_ORDERKEY作为排序键。

    某字段出现次数TOP 10

    # ClickHouse
    SELECT LO_SHIPMODE,COUNT() FROM lineorder GROUP BY LO_SHIPMODE ORDER BY COUNT() DESC LIMIT 10

    # Elasticsearch
    GET lineorder/_search
    {
    "aggs": {
    "1": {
    "terms": {
    "field": "LO_SHIPMODE.keyword",
    "order": {
    "_count": "desc"
    },
    "size": 10
    }
    }
    },
    "size": 0
    }

    某字段按年进行计数


    # ClickHouse
    SELECT toYear(LO_ORDERDATE),COUNT() FROM lineorder GROUP BY toYear(LO_ORDERDATE) FORMAT PrettyCompactMonoBlock

    # Elasticsearch
    GET lineorder/_search
    {
    "aggs": {
    "2": {
    "date_histogram": {
    "field": "LO_ORDERDATE",
    "calendar_interval":"1y",
    "format":"yyyy-MM-dd"
    }
    }
    },
    "size": 0
    }

    多个字段按年进行统计


    # ClickHouse
    SELECT LO_ORDERDATE,LO_ORDERKEY,LO_SHIPMODE,LO_ORDERPRIORITY,LO_COMMITDATE FROM lineorder WHERE LO_ORDERDATE >= '1992-01-01' AND LO_ORDERDATE < '1993-01-01' ORDER BY LO_ORDERDATE LIMIT 500

    # Elasticsearch
    GET lineorder/_search
    {
    "size": 500,
    "sort": [
    {
    "timestamp": {
    "order": "desc",
    "unmapped_type": "boolean"
    }
    }
    ],
    "query": {
    "bool": {
    "must": [],
    "filter": [
    {
    "match_all": {}
    },
    {
    "match_all": {}
    },
    {
    "range": {
    "LO_ORDERDATE": {
    "gte": "1992-01-01",
    "lte": "1993-01-01",
    "format": "strict_date_optional_time"
    }
    }
    }
    ],
    "should": [],
    "must_not": []
    }
    }
    }

    基于时间的多字段聚合


    # ClickHouse
    SELECT toYear(LO_ORDERDATE),LO_SHIPMODE,COUNT() FROM lineorder GROUP BY toYear(LO_ORDERDATE),LO_SHIPMODE ORDER BY toYear(LO_ORDERDATE) FORMAT PrettyCompactMonoBlock

    # Elasticsearch
    GET lineorder/_search
    {
    "aggs": {
    "3": {
    "terms": {
    "field": "LO_SHIPMODE.keyword",
    "order": {
    "_count": "desc"
    },
    "size": 10
    },
    "aggs": {
    "2": {
    "date_histogram": {
    "field": "LO_ORDERDATE",
    "calendar_interval": "1y",
    "time_zone": "Asia/Shanghai",
    "min_doc_count": 1
    }
    }
    }
    }
    },
    "size": 0
    }

    基于时间的多字段聚合


    # ClickHouse
    SELECT toYear(LO_ORDERDATE),LO_SHIPMODE,COUNT() FROM lineorder GROUP BY toYear(LO_ORDERDATE),LO_SHIPMODE ORDER BY toYear(LO_ORDERDATE) FORMAT PrettyCompactMonoBlock

    # Elasticsearch
    GET lineorder/_search
    {
    "aggs": {
    "3": {
    "terms": {
    "field": "LO_SHIPMODE.keyword",
    "order": {
    "_count": "desc"
    },
    "size": 10
    },
    "aggs": {
    "2": {
    "date_histogram": {
    "field": "LO_ORDERDATE",
    "calendar_interval": "1y",
    "time_zone": "Asia/Shanghai",
    "min_doc_count": 1
    }
    }
    }
    }
    },
    "size": 0
    }

    聚合嵌套(非时间字段)


    # ClickHouse
    SELECT LO_SHIPMODE,COUNT(LO_SHIPMODE),LO_ORDERPRIORITY,COUNT(LO_ORDERPRIORITY) FROM lineorder GROUP BY LO_SHIPMODE,LO_ORDERPRIORITY ORDER BY COUNT(LO_SHIPMODE),COUNT(LO_ORDERPRIORITY) LIMIT 5 BY LO_SHIPMODE,LO_ORDERPRIORITY

    # Elasticsearch
    GET lineorder/_search
    {
    "aggs": {
    "2": {
    "terms": {
    "field": "LO_SHIPMODE.keyword",
    "order": {
    "_count": "desc"
    },
    "size": 5
    },
    "aggs": {
    "3": {
    "terms": {
    "field": "LO_ORDERPRIORITY.keyword",
    "order": {
    "_count": "desc"
    },
    "size": 5
    }
    }
    }
    }
    },
    "size": 0
    }


    聚合场景ck(ms)es(ms)性能对比
    基于时间的多字段聚合550615599近3倍
    多个字段按年进行计数(数据表)381626716倍多
    某字段出现次数 TOP 10(饼图)40487317近2倍
    某字段按年进行计数(时间趋势图)9012325725倍多
    聚合嵌套(非时间字段)6937157672倍多

    相同数据量下,ClickHouse的聚合性能都要优于Elasticsearch,且如果基于排序键进行聚合,性能更好,是ES的数倍。
    此外,ClickHouse的SummaryMergeTree、AggregatingMergeTree表引擎支持后台自动聚合数据,所以在某些场景下其聚合分析性能会更优

    学习资料


    关注公众号回复:"clickhouse",即可获得ClickHouse资料合集。

    文章转载自大数据猫,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

    评论