作者:罗伯特·伯尼尔(Robert Bernier)
2023年11月3日
前言
对于 Citus,跨多个节点甚至跨数据中心成功构建和扩展 PostgreSQL 集群有时感觉是一种艺术形式,因为有很多方法可以构建它。
我认为有一条公理恰如其分地适用于这种情况,描述了科学与艺术之间的差异:
– 科学:1 个问题 -> 1 个解决方案
– 艺术:1 个问题 > 1,000 个解决方案
虽然我之前写的两篇 Citus 博客涵盖了创建列式表和利用数据冗余的相当简单的概念,但这篇博客探讨了使用这个可免费下载且功能齐全的 PostgreSQL 扩展的架构设计注意事项。
我不会列出 Citus 中可用的概念、函数调用等,而是探索从单个主机开始扩展数据库系统。我不会涵盖所有功能,但足以展示您希望看到更多可以学习为自己完成的任务。
关于集群
PostgreSQL 版本 15
Citus 扩展(我将使用版本 11,但也有更新的可用)
PostgreSQL 集群
一个协调器节点
citus-coord-01
三个工作节点
citus1
citus2
citus3
硬件
AWS 实例
Ubuntu Server 20.04,SSD 卷类型
64 位 (x86)
c5.xlarge 4vCPU 8GB-RAM
存储:EBS volume (root)
80GB
gp2 (IOPS 240/3000)
此外,还将集成高可用性,从而保证在一个工作节点出现故障的情况下集群的可行性。
利用 pgbench(与 PostgreSQL 捆绑在一起的基准测试实用程序),我将通过执行一系列 DML 操作来使集群完成其步伐。使用其默认的 tpc-b 基准测试,可以对任何规模的数据库进行压力测试,从几个客户端到模拟数千个大小为 TB 的系统交互(如果需要)。
步骤
第一步是使用 PostgreSQL 和 Citus 预配四个节点。
可通过以下链接获取和安装扩展:
https://www.percona.com/blog/powering-postgresql-15-with-columnar-tables/
https://github.com/citusdata/citus
PgSQL
-- update each postgres node and enabling the extension
-- with a server restart
alter system set shared_preload_libraries='citus';
在四个节点中的每一个节点上,使用 Citus 扩展创建数据库:
-- using psql
create database pgbench;
c pgbench
create extension citus;
一个工作节点
从单个工作节点开始,登录到 citus-coord-01,声明协调器主机,然后从单个节点 citus1 开始:
-- execute on citus-coord-01
psql pgbench <<_eof1_
select citus_set_coordinator_host('citus-coord-01', 5432);
select citus_add_node('citus1', 5432);
虽然仍然登录到数据库 pgbench 中,但获取工作节点的列表,此时只是一个:
PgSQL
select * from master_get_active_worker_nodes();
node_name | node_port
-----------+-----------
citus1 | 5432
现在,事情变得有趣了;PGBench的正常初始化是创建并使用适当的约束来填充表。但是,在这种情况下,仅初始化没有数据或约束的表:
-- execute the following command on citus-coord-01
pgbench -iI t pgbench
pgbench=# d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size |
-------+------------------+-------+----------+-------------+---------------+---------+-----
public | citus_tables | view | postgres | permanent | | 0 bytes |
public | pgbench_accounts | table | postgres | permanent | heap | 0 bytes |
public | pgbench_branches | table | postgres | permanent | heap | 0 bytes |
public | pgbench_history | table | postgres | permanent | heap | 0 bytes |
public | pgbench_tellers | table | postgres | permanent | heap | 0 bytes |
在协调器节点上执行的下一组命令将 pgbench 表分布在节点 citus1 之间:
psql pgbench <<_eof1_
BEGIN;
select create_distributed_table('pgbench_history', 'aid');
select create_distributed_table('pgbench_accounts', 'aid');
select create_distributed_table('pgbench_branches', 'bid');
select create_distributed_table('pgbench_tellers', 'tid');
COMMIT;
_eof1_
现在,我们可以开始填充表格了。以下调用将生成近 4GB 的数据。请注意,我仍然没有添加任何约束:
pgbench -iI g -s 300 pgbench
此命令显示节点上的分片总数。为给定表生成的默认分片数为 32。
PgSQL
select * from citus_tables;
table_name | citus_table_type | distribution_column | colocation_id | table_size | shard_count
------------------+------------------+---------------------+---------------+------------+-------------
pgbench_accounts | distributed | aid | 1 | 3844 MB | 32
pgbench_branches | distributed | bid | 1 | 256 kB | 32
pgbench_history | distributed | aid | 1 | 0 bytes | 32
pgbench_tellers | distributed | tid | 1 | 256 kB | 32
您还会注意到,只有部分分片填充了数据,这在使用数据初始化 pgbench 时是正常的:
PgSQL
select * from citus_shards;
table_name | shardid | shard_name | citus_table_type | colocation_id | nodename | nodeport | shard_size
------------------+---------+-------------------------+------------------+---------------+----------+----------+------------
pgbench_accounts | 102040 | pgbench_accounts_102040 | distributed | 1 | citus1 | 5432 | 125853696
pgbench_accounts | 102041 | pgbench_accounts_102041 | distributed | 1 | citus1 | 5432 | 126173184
pgbench_accounts | 102042 | pgbench_accounts_102042 | distributed | 1 | citus1 | 5432 | 125739008
pgbench_accounts | 102043 | pgbench_accounts_102043 | distributed | 1 | citus1 | 5432 | 125968384
.
.
pgbench_branches | 102072 | pgbench_branches_102072 | distributed | 1 | citus1 | 5432 | 8192
pgbench_branches | 102073 | pgbench_branches_102073 | distributed | 1 | citus1 | 5432 | 8192
pgbench_branches | 102074 | pgbench_branches_102074 | distributed | 1 | citus1 | 5432 | 8192
pgbench_branches | 102075 | pgbench_branches_102075 | distributed | 1 | citus1 | 5432 | 8192
.
.
pgbench_history | 102008 | pgbench_history_102008 | distributed | 1 | citus1 | 5432 | 0
pgbench_history | 102009 | pgbench_history_102009 | distributed | 1 | citus1 | 5432 | 0
pgbench_history | 102010 | pgbench_history_102010 | distributed | 1 | citus1 | 5432 | 0
pgbench_history | 102011 | pgbench_history_102011 | distributed | 1 | citus1 | 5432 | 0
.
.
pgbench_tellers | 102104 | pgbench_tellers_102104 | distributed | 1 | citus1 | 5432 | 8192
pgbench_tellers | 102105 | pgbench_tellers_102105 | distributed | 1 | citus1 | 5432 | 8192
pgbench_tellers | 102106 | pgbench_tellers_102106 | distributed | 1 | citus1 | 5432 | 8192
pgbench_tellers | 102107 | pgbench_tellers_102107 | distributed | 1 | citus1 | 5432 | 8192
pgbench_tellers | 102108 | pgbench_tellers_102108 | distributed | 1 | citus1 | 5432 | 8192
.
.
现在,执行基准测试:
-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench
结果并不漂亮。
transaction type:
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 1524
number of failed transactions: 0 (0.000%)
latency average = 791.374 ms
latency stddev = 157.710 ms
initial connection time = 15.704 ms
tps = 25.197427 (without initial connection time)
注意 请参阅本博客的底部,其中总结了随着该集群的发展而列出的各种基准测试。
为了加快速度,必须添加基准指数。但是,pgbench这种添加索引的标准方法失败了:
postgres@citus-coord-01:~$ pgbench -iI p pgbench
creating primary keys...
pgbench: error: query failed: ERROR: cannot create constraint without a name on a distributed table
pgbench: detail: Query was: alter table pgbench_branches add primary key (bid)
要成功添加索引和约束,需要按以下方式显式命名和创建它们。请注意,表 pgbench_history 需要 REPLICA 标识,因为它没有主键:
# THIS WORKS!
psql pgbench <<_eof1_
--
-- indexes and constraints must be explicitly named
--
BEGIN;
create unique index pgbench_accounts_pk on pgbench_accounts(aid);
create unique index pgbench_branches_pk on pgbench_branches(bid);
create unique index pgbench_tellers_pk on pgbench_tellers(tid);
alter table pgbench_accounts
add constraint pk_accounts primary key using index pgbench_accounts_pk;
alter table pgbench_branches
add constraint pk_branches primary key using index pgbench_branches_pk;
alter table pgbench_tellers
add constraint pk_tellers primary key using index pgbench_tellers_pk;
qecho adding REPLICA IDENTITY (no PK present)
alter table pgbench_history replica identity full;
COMMIT;
_eof1_
重复基准测试运行会产生更好的结果:
-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 135273
number of failed transactions: 0 (0.000%)
latency average = 8.865 ms
latency stddev = 9.452 ms
initial connection time = 15.556 ms
tps = 2254.544852 (without initial connection time)
两个工作节点
添加第二个工作节点非常简单,只需要两个步骤:
1.添加节点。
2.在两个节点之间重新平衡分片。
在协调器节点 citus-coord-01 上执行此操作:
psql pgbench <<_eof1_
--
-- it's understood that database pgbench already exists on citus2
-- and extension citus has been created
--
qecho adding node citus2 ...
select citus_add_node('citus2', 5432);
qecho rebalancing shards across TWO nodes ...
select * from rebalance_table_shards();
ANALYZE;
_eof1_
下面是部分输出,因为表在两个节点之间重新平衡:
adding node citus2 ...
citus_add_node
----------------
3
rebalancing shards across TWO nodes ...
NOTICE: Moving shard 102008 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102009 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102010 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102011 from citus1:5432 to citus2:5432 ...
.
.
NOTICE: Moving shard 102020 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102021 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102022 from citus1:5432 to citus2:5432 ...
NOTICE: Moving shard 102023 from citus1:5432 to citus2:5432 ...
重复基准测试再次表明性能有所下降。考虑到我的 POC 由一台计算机上的容器组成,所有容器共享相同的系统资源,这是意料之中的:
-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench
transaction type:
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 130190
number of failed transactions: 0 (0.000%)
latency average = 9.214 ms
latency stddev = 19.159 ms
initial connection time = 15.776 ms
tps = 2169.559969 (without initial connection time)
三个工作节点
根据双工作节点配置,添加另一个节点非常简单。
psql pgbench <<_eof1_
qecho adding node citus3 ...
select citus_add_node('citus3', 5432);
qecho rebalancing shards across TWO nodes ...
select * from rebalance_table_shards();
_eof1_
重复基准测试为我们提供了更新的性能指标。
-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 142630
number of failed transactions: 0 (0.000%)
latency average = 8.410 ms
latency stddev = 5.542 ms
initial connection time = 16.407 ms
tps = 2377.227920 (without initial connection time)
三个工作节点,具有冗余/复制功能
下一步很有意思,因为它通过将复制因子从 1 增加到 2 来增加冗余,这样每个表的每个数据分片的两个副本不仅重复,而且位于两个不同的节点上。
注意:新更新的复制因子仅在事后才影响表。预先存在的表不受新复制因子的影响:
psql pgbench <<_eof1_
qecho increasing replication factor to TWO
alter system set citus.shard_replication_factor=2;
select pg_reload_conf();
_eof1_
提示:为了复制表,请在创建表之前设置复制因子。
为了跨节点复制现有表,首先通过将表集中到协调器节点上来取消这些表的分布。将它们重新分布在三个节点群集中会自动添加复制。
这会将表集中到协调器节点上:
psql pgbench <<_eof1_
BEGIN;
select undistribute_table('pgbench_history');
select undistribute_table('pgbench_accounts');
select undistribute_table('pgbench_branches');
select undistribute_table('pgbench_tellers');
COMMIT;
_eof1_
这些表在三节点群集中重新分布和重新平衡,并具有两个正确的冗余:
psql pgbench <<_eof1_
BEGIN;
select create_distributed_table('pgbench_history', 'aid');
select create_distributed_table('pgbench_accounts', 'aid');
select create_distributed_table('pgbench_branches', 'bid');
select create_distributed_table('pgbench_tellers', 'tid');
COMMIT;
-- remove legacy tables from the coordinator node
BEGIN;
select truncate_local_data_after_distributing_table($$public.pgbench_history$$);
select truncate_local_data_after_distributing_table($$public.pgbench_accounts$$);
select truncate_local_data_after_distributing_table($$public.pgbench_branches$$);
select truncate_local_data_after_distributing_table($$public.pgbench_tellers$$);
COMMIT;
_eof1_
重复基准测试为我们提供了更新的性能指标。
-- execute the following on the coordinator node
pgbench -c 20 -j 3 -T 60 -P 3 pgbench
transaction type:
scaling factor: 300
query mode: simple
number of clients: 20
number of threads: 3
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 68834
number of failed transactions: 0 (0.000%)
latency average = 17.430 ms
latency stddev = 12.818 ms
initial connection time = 16.321 ms
tps = 1147.093369 (without initial connection time)
基准测试结果
以下是各种基准测试运行的摘要,让您了解相对性能。
RUN NUM 节点 使用 IDX 复制因子 TPS
1 1 no 1 25
2 1 yes 1 2255
3 2 yes 1 2170
4 3 yes 1 2377
5 3 yes 2 1147
注意
虽然博客中没有提到,但在使用 CitusDB 时需要牢记以下几点:
1.用于在群集中传播操作的帐户的密码输入到所有节点上主帐户的 .pgpass 文件中。
2.“wal_level”设置为logical。
3.冗余可能会降低整体性能。
请记住,性能指标不仅取决于集群的硬件和系统资源,还取决于调优级别,在这种情况下,调优级别尚未得到解决。
在以后的博客中,我将继续探索 Citus,将pgbench扩展到其他架构中。
Percona Distribution for PostgreSQL 提供来自开源社区的最佳和最关键的企业组件,在一个发行版中,经过设计和测试,可以协同工作。






