暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

ClickHouse 25.5 版本发布说明

ClickHouseInc 2025-06-10
110


本文字数:11882;估计阅读时间:30 分钟

作者:ClickHouse Team


Meetup活动


ClickHouse 杭州第二届 Meetup 开启报名,详见文末海报!


又到了每月一次的版本更新时间!

发布概要

ClickHouse 25.5 带来了 15 项新功能 🌸 23 项性能优化 🦋 以及 64 个缺陷修复 🐝


本次发布的亮点包括:向量相似度索引进入 Beta 阶段、Iceberg 表支持 Hive metastore catalog、支持查看每个函数何时被添加到 ClickHouse、本地工具 clickhouse-local 引入隐式表支持等诸多功能更新。


新贡献者

我们热烈欢迎所有在 25.5 版本中首次做出贡献的开发者!ClickHouse 社区正在持续壮大,正是因为大家的努力,ClickHouse 才能不断发展壮大。

以下是本次的新贡献者名单:

Andrian Iliev, Colerar, Dasha Wessely, Denis, KovalevDima, Kyamran, Marta Paes, Mojtaba Ghahari, Sachin Singh, YjyJeff, andrei tinikov, caicre, codeworse, gvoelfin, morsapaes, samay-sharma, shanfengp, tdufour


Iceberg 表支持 Hive metastore catalog

贡献者:scanhex12

ClickHouse 正在不断拓展对 Lakehouse 架构的支持。此前的版本已支持 Unity 和 AWS Glue 数据目录,本次新增了对 Hive metastore catalog 的支持。

用户可以通过 DataLakeCatalog 表引擎创建一个表来连接 Hive catalog:

    SET allow_experimental_database_hms_catalog=true;
    CREATE DATABASE hive
    ENGINE = DataLakeCatalog('thrift://hive:9083''''')
    SETTINGS
      catalog_type = 'hive',
      warehouse = 'demo',
      storage_endpoint = 'http://minio:9000/warehouse';

    随后,便可以像操作普通表一样查询该 catalog 中的表:

      SELECT count()
      FROM default.`iceberg.hits`;


      clickhouse-local 中的隐式表

      贡献者:Alexey Milovidov

      在使用 clickhouse-local 工具处理标准输入文件时,现在可以省略 FROM 和 SELECT 子句。

      举个例子,假设我们想要解析 GitHub 上 football.json 项目中的一个 JSON 文件,可以使用如下查询来探索文件中的属性结构:

        curl -sS 'https://raw.githubusercontent.com/openfootball/football.json/refs/heads/master/2024-25/en.1.json' | 
        ./clickhouse -q "JSONAllPathsWithTypes(json)" --input-format JSONAsObject
          {'matches':'Array(JSON(max_dynamic_types=16, max_dynamic_paths=256))','name':'String'}

          从查询结果中,我们可以看到两个属性:name 和 matches。接下来,我们可以进一步探索 matches 属性的内容:

            curl -sS https://raw.githubusercontent.com/openfootball/football.json/refs/heads/master/2024-25/en.1.json | 
            ./clickhouse -q "arrayJoin(json.matches::Array(JSON)) LIMIT 5" --input-format JSONAsObject
              {"date":"2024-08-16","round":"Matchday 1","score":{"ft":["1","0"],"ht":["0","0"]},"team1":"Manchester United FC","team2":"Fulham FC","time":"20:00"}
              {"date":"2024-08-17","round":"Matchday 1","score":{"ft":["0","2"],"ht":["0","0"]},"team1":"Ipswich Town FC","team2":"Liverpool FC","time":"12:30"}
              {"date":"2024-08-17","round":"Matchday 1","score":{"ft":["2","0"],"ht":["1","0"]},"team1":"Arsenal FC","team2":"Wolverhampton Wanderers FC","time":"15:00"}
              {"date":"2024-08-17","round":"Matchday 1","score":{"ft":["0","3"],"ht":["0","1"]},"team1":"Everton FC","team2":"Brighton & Hove Albion FC","time":"15:00"}
              {"date":"2024-08-17","round":"Matchday 1","score":{"ft":["1","0"],"ht":["1","0"]},"team1":"Newcastle United FC","team2":"Southampton FC","time":"15:00"}

              如果我们希望对 matches 数组执行更深入的查询操作,就需要将查询语句改写为包含 FROM 子句的形式,如下所示:

                curl -sS https://raw.githubusercontent.com/openfootball/football.json/refs/heads/master/2024-25/en.1.json |
                ./clickhouse -q \
                "m.date, m.team1, m.team2, m.score.ft::Array(String)[1] || '-' || m.score.ft::Array(String)[2] AS score
                FROM table
                ARRAY JOIN json.matches::Array(JSON) AS m
                LIMIT 5" \
                --input-format JSONAsObject \
                --output-format Pretty \
                --output_format_pretty_row_numbers=0

                此外,我们还支持通过指定输出格式,在查询结果中包含列名:

                  ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
                  ┃ m.date     ┃ m.team1              ┃ m.team2                    ┃ score ┃
                  ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
                  │ 2024-08-16 │ Manchester United FC │ Fulham FC                  │ 1-0   │
                  ├────────────┼──────────────────────┼────────────────────────────┼───────┤
                  │ 2024-08-17 │ Ipswich Town FC      │ Liverpool FC               │ 0-2   │
                  ├────────────┼──────────────────────┼────────────────────────────┼───────┤
                  │ 2024-08-17 │ Arsenal FC           │ Wolverhampton Wanderers FC │ 2-0   │
                  ├────────────┼──────────────────────┼────────────────────────────┼───────┤
                  │ 2024-08-17 │ Everton FC           │ Brighton & Hove Albion FC  │ 0-3   │
                  ├────────────┼──────────────────────┼────────────────────────────┼───────┤
                  │ 2024-08-17 │ Newcastle United FC  │ Southampton FC             │ 1-0   │
                  └────────────┴──────────────────────┴────────────────────────────┴───────┘


                  introduced_in 字段用于追踪函数的新增版本

                  贡献者:Robert Schulzesy

                  stem.functions 系统表新增了一个字段 introduced_in,用于记录每个函数首次被添加到 ClickHouse 的版本号。目前该字段已经覆盖了一部分新添加的函数,未来也会陆续补充旧函数的信息。

                  下面是 25.5 版本中新增的函数列表:

                    SELECT name, introduced_in, returned_value
                    FROM system.functions
                    WHERE introduced_in = '25.5'
                    FORMAT PrettySpace;
                      name                   description
                       icebergBucket          Implements logic of iceberg bucket transform: https://iceberg.apache.org/spec/#bucket-transform-details.
                       icebergHash            Implements logic of iceberg hashing transform: https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements.
                       mapContainsValue       Checks whether the map has the specified value.
                       mapContainsValueLike   Checks whether map contains value LIKE specified pattern.
                       mapContainsKey         Checks whether the map has the specified key.
                       mapExtractValueLike    Returns a map with elements which value matches the specified pattern.
                       stringBytesEntropy     Calculates Shannon's entropy of byte distribution in a string.
                       stringBytesUniq        Counts the number of distinct bytes in a string.


                      Parquet 格式支持 Geo 类型

                      贡献者:scanhex12

                      ClickHouse 的 Parquet 读取器现已支持 Geo 类型。我们可以通过以下查询语句,对 Overture Maps 数据集中的几何字段进行解析:

                        SELECT id, geometry, bbox, 
                               COLUMNS('^id|geometry|bbox') APPLY(toTypeName)
                        FROM s3('s3://overturemaps-us-west-2/release/2025-04-23.0/theme=places/type=place/*')
                        WHERE categories.primary = 'pizza_restaurant'
                        LIMIT 1
                        SETTINGS schema_inference_make_columns_nullable = 0;
                          Row 1:
                          ──────
                          id:                   08f95008132998a6036bed7d0b7e7661
                          geometry:             (119.4407421,-5.15814)
                          bbox:                 (119.44073,119.44074,-5.15814,-5.158139)
                          toTypeName(id):       String
                          toTypeName(geometry): Tuple(Float64, Float64)
                          toTypeName(bbox):     Tuple(
                              xmin Float32,
                              xmax Float32,
                              ymin Float32,
                              ymax Float32)

                          现在,geometry 字段可以被正确识别并解析。而在旧版本中,该字段通常会以二进制的形式显示,如下所示:

                            Row 1:
                            ──────
                            id:                   08ff2a1b744f064403498c7aa7fce7a1
                            geometry:             (-149.069345,-83.5991503)
                            bbox:                 (-149.06935,-149.06932,-83.59915,-83.599144)
                            toTypeName(id):       String
                            toTypeName(geometry): Point
                            toTypeName(bbox):     Tuple(
                                xmin Float32,
                                xmax Float32,
                                ymin Float32,
                                ymax Float32)

                            有了这个改进,ClickHouse 已可用于查询 GeoParquet 数据集。我们也一直在寻找有趣的地理数据集,欢迎大家推荐值得尝试的数据源!


                            向量搜索进入 Beta 阶段

                            贡献者:Shankar Iyer

                            ClickHouse 中的向量相似度索引功能现已从实验阶段升级为 Beta 阶段,并新增了对混合搜索中前置过滤和后置过滤策略的支持。

                            我们以下面这段示例,展示如何基于 Cohere 的 Wikipedia 数据子集使用向量索引。首先,我们启用相似度索引功能:

                              SET allow_experimental_vector_similarity_index=1;

                              接着,我们创建一个包含 emb 向量字段的表,并基于该字段建立 HNSW 索引:

                                CREATE TABLE wikiEmbeddings (
                                  id Int32,
                                  title String,
                                  text String,
                                  url String,
                                  wiki_id Int32,
                                  views Float32,
                                  paragraph_id Int32,
                                  langs Int32,
                                  emb Array(Float32),
                                  INDEX emb_hnsw emb TYPE vector_similarity('hnsw''L2Distance'768)
                                )
                                ORDER BY id;

                                然后从 Hugging Face 导入相应数据:

                                  INSERT INTO wikiEmbeddings
                                  select *
                                  FROM url('https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings/resolve/main/data/train-00000-of-00253-8d3dffb4e6ef0304.parquet')
                                  SETTINGS max_http_get_redirects=10;

                                  现在就可以开始查询了。以下查询将返回数据集中最相似的一条记录:

                                    SELECT * EXCEPT emb
                                    FROM wikiEmbeddings
                                    WHERE id = 120356;
                                      Row 1:
                                      ──────
                                      id:           120356
                                      title:        Germany national football team
                                      text:         The Germany national football team () represents Germany in men's international football and played its first match in 1908. The team is governed by the German Football Association ("Deutscher Fußball-Bund"), founded in 1900. Between 1949 and 1990, separate German national teams were recognised by FIFA due to Allied occupation and division: the DFB's team representing the Federal Republic of Germany (commonly referred to as West Germany in English between 1949 and 1990), the Saarland team representing the Saar Protectorate (19501956and the East Germany team representing the German Democratic Republic (19521990). The latter two were absorbed along with their records; the present team represents the reunified Federal Republic. The official name and code "Germany FR (FRG)" was shortened to "Germany (GER)" following reunification in 1990.
                                      url:          https://en.wikipedia.org/wiki?curid=250204
                                      wiki_id:      250204
                                      views:        3271.844
                                      paragraph_id: 0
                                      langs:        98

                                      结果显示,这条记录来自 Wikipedia 上关于德国国家足球队的页面。我们可以进一步查询与该记录最相似的其他条目:

                                        WITH (SELECT emb FROM wikiEmbeddings WHERE id = 120356AS lookup
                                        SELECT id, title, text, url, L2Distance(emb, lookup) AS dist
                                        FROM wikiEmbeddings
                                        ORDER BY dist
                                        LIMIT 3
                                        FORMAT Vertical;
                                          Row 1:
                                          ──────
                                          id:    120356
                                          title: Germany national football team
                                          text:  The Germany national football team () represents Germany in men's international football and played its first match in 1908. The team is governed by the German Football Association ("Deutscher Fußball-Bund"), founded in 1900. Between 1949 and 1990, separate German national teams were recognised by FIFA due to Allied occupation and division: the DFB's team representing the Federal Republic of Germany (commonly referred to as West Germany in English between 1949 and 1990), the Saarland team representing the Saar Protectorate (19501956and the East Germany team representing the German Democratic Republic (19521990). The latter two were absorbed along with their records; the present team represents the reunified Federal Republic. The official name and code "Germany FR (FRG)" was shortened to "Germany (GER)" following reunification in 1990.
                                          url:   https://en.wikipedia.org/wiki?curid=250204
                                          dist:  0
                                          Row 2:
                                          ──────
                                          id:    120367
                                          title: Germany national football team
                                          text:  The Federal Republic of Germany, which was referred to as West Germany, continued the DFB. With recognition by FIFA and UEFA, the DFB maintained and continued the record of the pre-war team. Switzerland was the first team that played West Germany in 1950with the latter qualifying for the 1954 World Cup and the former hosting it.
                                          url:   https://en.wikipedia.org/wiki?curid=250204
                                          dist:  5.537961
                                          Row 3:
                                          ──────
                                          id:    120366
                                          title: Germany national football team
                                          text:  After World War II, Germany was banned from competition in most sports until 1950. The DFB was not a full member of FIFA, and none of the three new German states – West Germany, East Germany, and Saarland – entered the 1950 World Cup qualifiers.
                                          url:   https://en.wikipedia.org/wiki?curid=250204
                                          dist:  6.296622

                                          不过,这些结果都来自同一篇 Wikipedia 文章的不同段落,因此参考意义有限。为了让结果更加丰富多样,我们可以添加 WHERE 子句,排除与查询记录具有相同 wiki_id 的条目:

                                            WITH (SELECT emb FROM wikiEmbeddings WHERE id = 120356AS lookup,
                                                 (SELECT wiki_id FROM wikiEmbeddings WHERE id = 120356AS lookup_wiki_id
                                            SELECT id, title, text, url, L2Distance(emb, lookup) AS dist
                                            FROM wikiEmbeddings
                                            WHERE wiki_id <> lookup_wiki_id
                                            ORDER BY dist
                                            LIMIT 3
                                            FORMAT Vertical;
                                              Row 1:
                                              ──────
                                              id:    6420
                                              title: Germany
                                              text:  Football is the most popular sport in Germany. With more than 7 million official members, the German Football Association ("Deutscher Fußball-Bund"is the largest single-sport organisation worldwide, and the German top league, the Bundesliga, attracts the second-highest average attendance of all professional sports leagues in the world. The German men's national football team won the FIFA World Cup in 1954, 1974, 1990, and 2014, the UEFA European Championship in 1972, 1980 and 1996, and the FIFA Confederations Cup in 2017.
                                              url:   https://en.wikipedia.org/wiki?curid=11867
                                              dist:  6.9714794
                                              Row 2:
                                              ──────
                                              id:    6336
                                              title: Germany
                                              text:  Formal unification of Germany into the modern nation-state was commenced on 18 August 1866 with the North German Confederation Treaty establishing the Prussia-led North German Confederation later transformed in 1871 into the German Empire. After World War I and the German Revolution of 1918–1919, the Empire was in turn transformed into the semi-presidential Weimar Republic. The Nazi seizure of power in 1933 led to the establishment of a totalitarian dictatorship, World War II, and the Holocaust. After the end of World War II in Europe and a period of Allied occupation, Germany as a whole was organized into two separate polities with limited sovereignity: the Federal Republic of Germany, generally known as West Germany, and the German Democratic Republic, East Germany, while Berlin continued its Four Power status. The Federal Republic of Germany was a founding member of the European Economic Community and the European Union, while the German Democratic Republic was a communist Eastern Bloc state and member of the Warsaw Pact. After the fall of communism, German reunification saw the former East German states join the Federal Republic of Germany on 3 October 1990—becoming a federal parliamentary republic.
                                              url:   https://en.wikipedia.org/wiki?curid=11867
                                              dist:  6.98255
                                              Row 3:
                                              ──────
                                              id:    7767
                                              title: FIFA World Cup
                                              text:  After FIFA was founded in 1904, it tried to arrange an international football tournament between nations outside the Olympic framework in Switzerland in 1906. These were very early days for international football, and the official history of FIFA describes the competition as having been a failure.
                                              url:   https://en.wikipedia.org/wiki?curid=11370
                                              dist:  7.5603456
                                              dist:  7.736428

                                              正如前文所述,WHERE 子句中的过滤条件可以采用后置或前置方式执行。

                                              后置过滤指的是先使用向量索引检索候选数据,再对其执行过滤条件;而前置过滤则是先筛选符合条件的记录,再进行向量相似度匹配。

                                              你可以通过设置系统参数 vector_search_filter_strategy 来控制过滤方式。该参数默认值为 auto,表示由查询引擎根据启发式规则自动决定最优策略。你也可以将其显式设置为 postfilter 或 prefilter,以指定过滤顺序。


                                              Meetup 活动报名通知

                                              好消息:ClickHouse Hangzhou User Group第 2 届 Meetup 火热报名中,将于2025年07月05日在杭州西溪万怡酒店(杭州市西湖区蒋村街道文二西路770号.中国五村园8号楼)举行,扫码免费报名

                                              /END/


                                              试用阿里云 ClickHouse企业版


                                              轻松节省30%云资源成本?阿里云数据库ClickHouse 云原生架构全新升级,首次购买ClickHouse企业版计算和存储资源组合,首月消费不超过99.58元(包含最大16CCU+450G OSS用量)了解详情:https://t.aliyun.com/Kz5Z0q9G



                                              征稿启示

                                              面向社区长期正文,文章内容包括但不限于关于 ClickHouse 的技术研究、项目实践和创新做法等。建议行文风格干货输出&图文并茂。质量合格的文章将会发布在本公众号,优秀者也有机会推荐到 ClickHouse 官网。请将文章稿件的 WORD 版本发邮件至:Tracy.Wang@clickhouse.com


                                              文章转载自ClickHouseInc,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                              评论