如何使用 Citus 扩展单主机 PostgreSQL 数据库

开源软件联盟PostgreSQL分会 2023-11-15

395

作者：罗伯特·伯尼尔(Robert Bernier)

2023年11月3日

前言

对于 Citus，跨多个节点甚至跨数据中心成功构建和扩展 PostgreSQL 集群有时感觉是一种艺术形式，因为有很多方法可以构建它。

我认为有一条公理恰如其分地适用于这种情况，描述了科学与艺术之间的差异：

– 科学：1 个问题 -> 1 个解决方案

– 艺术：1 个问题 > 1,000 个解决方案

虽然我之前写的两篇 Citus 博客涵盖了创建列式表和利用数据冗余的相当简单的概念，但这篇博客探讨了使用这个可免费下载且功能齐全的 PostgreSQL 扩展的架构设计注意事项。

我不会列出 Citus 中可用的概念、函数调用等，而是探索从单个主机开始扩展数据库系统。我不会涵盖所有功能，但足以展示您希望看到更多可以学习为自己完成的任务。

关于集群

PostgreSQL 版本 15

Citus 扩展（我将使用版本 11，但也有更新的可用）

PostgreSQL 集群

一个协调器节点

citus-coord-01

三个工作节点

citus1

citus2

citus3

硬件

AWS 实例

Ubuntu Server 20.04，SSD 卷类型

64 位（x86）

c5.xlarge 4vCPU 8GB-RAM

存储：EBS volume (root)

80GB

gp2 （IOPS 240/3000）

此外，还将集成高可用性，从而保证在一个工作节点出现故障的情况下集群的可行性。

利用 pgbench（与 PostgreSQL 捆绑在一起的基准测试实用程序），我将通过执行一系列 DML 操作来使集群完成其步伐。使用其默认的 tpc-b 基准测试，可以对任何规模的数据库进行压力测试，从几个客户端到模拟数千个大小为 TB 的系统交互（如果需要）。

步骤

第一步是使用 PostgreSQL 和 Citus 预配四个节点。

可通过以下链接获取和安装扩展：

https://www.percona.com/blog/powering-postgresql-15-with-columnar-tables/

https://github.com/citusdata/citus

PgSQL
-- update each postgres node and enabling the extension 
-- with a server restart
alter system set shared_preload_libraries='citus';

在四个节点中的每一个节点上，使用 Citus 扩展创建数据库：

-- using psql
create database pgbench;
c pgbench
create extension citus;

一个工作节点

从单个工作节点开始，登录到 citus-coord-01，声明协调器主机，然后从单个节点 citus1 开始:

-- execute on citus-coord-01
  psql pgbench <<_eof1_
    select citus_set_coordinator_host('citus-coord-01', 5432);
    select citus_add_node('citus1', 5432);

虽然仍然登录到数据库 pgbench 中，但获取工作节点的列表，此时只是一个：

PgSQL
select * from master_get_active_worker_nodes();

node_name | node_port
-----------+-----------
citus1  |   5432

现在，事情变得有趣了；PGBench的正常初始化是创建并使用适当的约束来填充表。但是，在这种情况下，仅初始化没有数据或约束的表：

-- execute the following command on citus-coord-01
pgbench -iI t pgbench

pgbench=# d+
                    List of relations
Schema |    Name    | Type | Owner  | Persistence | Access method | Size  |
-------+------------------+-------+----------+-------------+---------------+---------+-----
public | citus_tables   | view | postgres | permanent  |        | 0 bytes |
public | pgbench_accounts | table | postgres | permanent  | heap     | 0 bytes |
public | pgbench_branches | table | postgres | permanent  | heap     | 0 bytes |
public | pgbench_history | table | postgres | permanent  | heap     | 0 bytes |
public | pgbench_tellers | table | postgres | permanent  | heap     | 0 bytes |

在协调器节点上执行的下一组命令将 pgbench 表分布在节点 citus1 之间:

psql pgbench <<_eof1_
  BEGIN;
    select create_distributed_table('pgbench_history', 'aid');
    select create_distributed_table('pgbench_accounts', 'aid');
    select create_distributed_table('pgbench_branches', 'bid');
    select create_distributed_table('pgbench_tellers', 'tid');
  COMMIT;
_eof1_

现在，我们可以开始填充表格了。以下调用将生成近 4GB 的数据。请注意，我仍然没有添加任何约束：

pgbench -iI g -s 300 pgbench

此命令显示节点上的分片总数。为给定表生成的默认分片数为 32。

PgSQL
select * from citus_tables;

  table_name  | citus_table_type | distribution_column | colocation_id | table_size | shard_count 
------------------+------------------+---------------------+---------------+------------+-------------
pgbench_accounts | distributed   | aid         |       1 | 3844 MB  |   32 
pgbench_branches | distributed   | bid         |       1 | 256 kB   |   32 
pgbench_history | distributed   | aid         |       1 | 0 bytes  |   32 
pgbench_tellers | distributed   | tid         |       1 | 256 kB   |   32

您还会注意到，只有部分分片填充了数据，这在使用数据初始化 pgbench 时是正常的：

PgSQL
select * from citus_shards;

  table_name  | shardid |    shard_name    | citus_table_type | colocation_id | nodename | nodeport | shard_size
------------------+---------+-------------------------+------------------+---------------+----------+----------+------------
pgbench_accounts | 102040 | pgbench_accounts_102040 | distributed   |       1 | citus1  |   5432 | 125853696
pgbench_accounts | 102041 | pgbench_accounts_102041 | distributed   |       1 | citus1  |   5432 | 126173184
pgbench_accounts | 102042 | pgbench_accounts_102042 | distributed   |       1 | citus1  |   5432 | 125739008
pgbench_accounts | 102043 | pgbench_accounts_102043 | distributed   |       1 | citus1  |   5432 | 125968384
.
.
pgbench_branches | 102072 | pgbench_branches_102072 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_branches | 102073 | pgbench_branches_102073 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_branches | 102074 | pgbench_branches_102074 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_branches | 102075 | pgbench_branches_102075 | distributed   |       1 | citus1  |   5432 |    8192
.
.
pgbench_history | 102008 | pgbench_history_102008 | distributed   |       1 | citus1  |   5432 |     0
pgbench_history | 102009 | pgbench_history_102009 | distributed   |       1 | citus1  |   5432 |     0
pgbench_history | 102010 | pgbench_history_102010 | distributed   |       1 | citus1  |   5432 |     0
pgbench_history | 102011 | pgbench_history_102011 | distributed   |       1 | citus1  |   5432 |     0
.
.
pgbench_tellers | 102104 | pgbench_tellers_102104 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_tellers | 102105 | pgbench_tellers_102105 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_tellers | 102106 | pgbench_tellers_102106 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_tellers | 102107 | pgbench_tellers_102107 | distributed   |       1 | citus1  |   5432 |    8192
pgbench_tellers | 102108 | pgbench_tellers_102108 | distributed   |       1 | citus1  |   5432 |    8192
.
.

现在，执行基准测试：

-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench

结果并不漂亮。

transaction type: 
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 1524
number of failed transactions: 0 (0.000%)
latency average = 791.374 ms
latency stddev = 157.710 ms
initial connection time = 15.704 ms
tps = 25.197427 (without initial connection time)

注意请参阅本博客的底部，其中总结了随着该集群的发展而列出的各种基准测试。

为了加快速度，必须添加基准指数。但是，pgbench这种添加索引的标准方法失败了:

postgres@citus-coord-01:~$ pgbench -iI p pgbench
creating primary keys...
pgbench: error: query failed: ERROR: cannot create constraint without a name on a distributed table
pgbench: detail: Query was: alter table pgbench_branches add primary key (bid)

要成功添加索引和约束，需要按以下方式显式命名和创建它们。请注意，表 pgbench_history 需要 REPLICA 标识，因为它没有主键：

# THIS WORKS!
psql pgbench <<_eof1_
--
-- indexes and constraints must be explicitly named
--
BEGIN;
create unique index pgbench_accounts_pk on pgbench_accounts(aid);
create unique index pgbench_branches_pk on pgbench_branches(bid);
create unique index pgbench_tellers_pk on pgbench_tellers(tid);
 
alter table pgbench_accounts
  add constraint pk_accounts primary key using index pgbench_accounts_pk;
 
alter table pgbench_branches
  add constraint pk_branches primary key using index pgbench_branches_pk;
 
alter table pgbench_tellers
  add constraint pk_tellers primary key using index pgbench_tellers_pk;
 
qecho adding REPLICA IDENTITY (no PK present)
alter table pgbench_history replica identity full;
COMMIT;
_eof1_

重复基准测试运行会产生更好的结果：

-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench

scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 135273
number of failed transactions: 0 (0.000%)
latency average = 8.865 ms
latency stddev = 9.452 ms
initial connection time = 15.556 ms
tps = 2254.544852 (without initial connection time)

两个工作节点

添加第二个工作节点非常简单，只需要两个步骤：

1.添加节点。

2.在两个节点之间重新平衡分片。

在协调器节点 citus-coord-01 上执行此操作:

  psql pgbench <<_eof1_
    --
    -- it's understood that database pgbench already exists on citus2
    -- and extension citus has been created
    --
    qecho adding node citus2 ...
    select citus_add_node('citus2', 5432);
    qecho rebalancing shards across TWO nodes ...
    select * from rebalance_table_shards();
    ANALYZE;
_eof1_

下面是部分输出，因为表在两个节点之间重新平衡：

adding node citus2 ...
citus_add_node
----------------
       3

rebalancing shards across TWO nodes ...
NOTICE: Moving shard 102008 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102009 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102010 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102011 from citus1:5432 to citus2:5432 ...
.
.
NOTICE: Moving shard 102020 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102021 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102022 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102023 from citus1:5432 to citus2:5432 ...

重复基准测试再次表明性能有所下降。考虑到我的 POC 由一台计算机上的容器组成，所有容器共享相同的系统资源，这是意料之中的：

-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench

transaction type: 
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 130190
number of failed transactions: 0 (0.000%)
latency average = 9.214 ms
latency stddev = 19.159 ms
initial connection time = 15.776 ms
tps = 2169.559969 (without initial connection time)

三个工作节点

根据双工作节点配置，添加另一个节点非常简单。

  psql pgbench <<_eof1_
    qecho adding node citus3 ...
    select citus_add_node('citus3', 5432);
    qecho rebalancing shards across TWO nodes ...
     select * from rebalance_table_shards();
_eof1_

重复基准测试为我们提供了更新的性能指标。

-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench

scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 142630
number of failed transactions: 0 (0.000%)
latency average = 8.410 ms
latency stddev = 5.542 ms
initial connection time = 16.407 ms
tps = 2377.227920 (without initial connection time)

三个工作节点，具有冗余/复制功能

下一步很有意思，因为它通过将复制因子从 1 增加到 2 来增加冗余，这样每个表的每个数据分片的两个副本不仅重复，而且位于两个不同的节点上。

注意：新更新的复制因子仅在事后才影响表。预先存在的表不受新复制因子的影响：

  psql pgbench <<_eof1_
    qecho increasing replication factor to TWO
    alter system set citus.shard_replication_factor=2;
    select pg_reload_conf();
_eof1_

提示：为了复制表，请在创建表之前设置复制因子。

为了跨节点复制现有表，首先通过将表集中到协调器节点上来取消这些表的分布。将它们重新分布在三个节点群集中会自动添加复制。

这会将表集中到协调器节点上：

psql pgbench <<_eof1_
  BEGIN;
    select undistribute_table('pgbench_history');
    select undistribute_table('pgbench_accounts');
    select undistribute_table('pgbench_branches');
    select undistribute_table('pgbench_tellers');
  COMMIT;
_eof1_

这些表在三节点群集中重新分布和重新平衡，并具有两个正确的冗余：

psql pgbench <<_eof1_
BEGIN;
 select create_distributed_table('pgbench_history', 'aid');
 select create_distributed_table('pgbench_accounts', 'aid');
 select create_distributed_table('pgbench_branches', 'bid');
 select create_distributed_table('pgbench_tellers', 'tid');
COMMIT;
 
-- remove legacy tables from the coordinator node
BEGIN;
  select truncate_local_data_after_distributing_table($$public.pgbench_history$$);
  select truncate_local_data_after_distributing_table($$public.pgbench_accounts$$);
  select truncate_local_data_after_distributing_table($$public.pgbench_branches$$);
  select truncate_local_data_after_distributing_table($$public.pgbench_tellers$$);
COMMIT;
_eof1_

重复基准测试为我们提供了更新的性能指标。

-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench

transaction type: 
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 68834
number of failed transactions: 0 (0.000%)
latency average = 17.430 ms
latency stddev = 12.818 ms
initial connection time = 16.321 ms
tps = 1147.093369 (without initial connection time)

基准测试结果

以下是各种基准测试运行的摘要，让您了解相对性能。

RUN NUM 节点使用 IDX 复制因子 TPS

1 1 no 1 25

2 1 yes 1 2255

3 2 yes 1 2170

4 3 yes 1 2377

5 3 yes 2 1147

注意

虽然博客中没有提到，但在使用 CitusDB 时需要牢记以下几点：

1.用于在群集中传播操作的帐户的密码输入到所有节点上主帐户的 .pgpass 文件中。

2.“wal_level”设置为logical。

3.冗余可能会降低整体性能。

请记住，性能指标不仅取决于集群的硬件和系统资源，还取决于调优级别，在这种情况下，调优级别尚未得到解决。

在以后的博客中，我将继续探索 Citus，将pgbench扩展到其他架构中。

Percona Distribution for PostgreSQL 提供来自开源社区的最佳和最关键的企业组件，在一个发行版中，经过设计和测试，可以协同工作。

文章转载自开源软件联盟PostgreSQL分会，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

如何使用 Citus 扩展单主机 PostgreSQL 数据库

前言

关于集群

步骤

一个工作节点

两个工作节点

三个工作节点

基准测试结果

注意

评论