作者
digoal
日期
2021-09-01
标签
PostgreSQL , Oracle RAC , PolarDB , 共享存储 , 计算存储分离
重大开源消息:
PolarDB PostgreSQL项目已于2021.9.1开源共享存储核心架构相关代码。该架构与Oracle RAC架构类似,支持在一份存储上创建多个实例,实现了一写多读、快速读扩展。 目前已开源代码可以在单机上快速部署。分布式文件系统PolarDB PFS代码也会很快开放,利用PFS可以真正在分布式共享存储上运行PolarDB。
业界首个开源云原生数据库,包含大量阿里云数据库核心自研技术,后续还会推出跨机并行执行等功能,加强分析能力。 原有的分布式版继续在 distributed分支上演进。
持续关注:
https://github.com/alibaba/PolarDB-for-PostgreSQL
centos 7.9 16c64g 2TB ssd
机器部署参考:
《PolarDB 为什么要解决FPW的性能问题?》
yum -y install coreutils glib2 lrzsz dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc gcc-c++ make smartmontools flex bison perl-devel perl-ExtUtils* openldap-devel jadetex openjade bzip2 git iotop lvm2 perf centos-release-scl
rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
parted -a optimal -s /dev/vdb mklabel gpt mkpart primary 1MiB 100%FREE
mkfs.ext4 /dev/vdb1 -m 0 -O extent,uninit_bg -b 4096 -L lv01
vi /etc/fstab
LABEL=lv01 /data01 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0
mkdir /data01
mount -a
vi /etc/sysctl.conf
# add by digoal.zhou
fs.aio-max-nr = 1048576
fs.file-max = 76724600
# 可选:kernel.core_pattern = /data01/corefiles/core_%e_%u_%t_%s.%p
# /data01/corefiles 事先建好,权限777,如果是软链接,对应的目录修改为777
kernel.sem = 4096 2147483647 2147483646 512000
# 信号量, ipcs -l 或 -u 查看,每16个进程一组,每组信号量需要17个信号量。
kernel.shmall = 107374182
# 所有共享内存段相加大小限制 (建议内存的80%),单位为页。
kernel.shmmax = 274877906944
# 最大单个共享内存段大小 (建议为内存一半), >9.2的版本已大幅降低共享内存的使用,单位为字节。
kernel.shmmni = 819200
# 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 262144
# The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 4194304
# The maximum receive socket buffer size in bytes
net.core.wmem_default = 262144
# The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304
# The maximum send socket buffer size in bytes.
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1
# 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击
net.ipv4.tcp_timestamps = 1
# 减少time_wait
net.ipv4.tcp_tw_recycle = 0
# 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它
net.ipv4.tcp_tw_reuse = 1
# 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 8192 65536 16777216
net.nf_conntrack_max = 1200000
net.netfilter.nf_conntrack_max = 1200000
vm.dirty_background_bytes = 409600000
# 系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘
# 默认为10%,大内存机器建议调整为直接指定多少字节
vm.dirty_expire_centisecs = 3000
# 比这个值老的脏页,将被刷到磁盘。3000表示30秒。
vm.dirty_ratio = 95
# 如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。
# 有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。
vm.dirty_writeback_centisecs = 100
# pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。
vm.swappiness = 0
# 不使用交换分区
vm.mmap_min_addr = 65536
vm.overcommit_memory = 0
# 在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 .
vm.overcommit_ratio = 90
# 当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。
vm.swappiness = 0
# 关闭交换分区
vm.zone_reclaim_mode = 0
# 禁用 numa, 或者在vmlinux中禁止.
net.ipv4.ip_local_port_range = 40000 65535
# 本地自动分配的TCP, UDP端口号范围
fs.nr_open=20480000
# 单个进程允许打开的文件句柄上限
# 以下参数请注意
# vm.extra_free_kbytes = 4096000
# vm.min_free_kbytes = 2097152 # vm.min_free_kbytes 建议每32G内存分配1G vm.min_free_kbytes
# 如果是小内存机器,以上两个值不建议设置
# vm.nr_hugepages = 66536
# 建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize
# vm.lowmem_reserve_ratio = 1 1 1
# 对于内存大于64G时,建议设置,否则建议默认值 256 256 32
sysctl -p
vi /etc/security/limits.d/20-nproc.conf
# nofile超过1048576的话,一定要先将sysctl的fs.nr_open设置为更大的值,并生效后才能继续设置nofile.
* soft nofile 1024000
* hard nofile 1024000
* soft nproc unlimited
* hard nproc unlimited
* soft core unlimited
* hard core unlimited
* soft memlock unlimited
* hard memlock unlimited
chmod +x /etc/rc.d/rc.local
vi /etc/rc.local
touch /var/lock/subsys/local
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
yum install -y readline-devel zlib-devel perl-CPAN bison flex git
cpan -fi Test::More IPC::Run
useradd polardb -d /data01/polardb
su - polardb
cd ~
git clone https://github.com/alibaba/PolarDB-for-PostgreSQL.git
./polardb_build.sh --withrep --repnum=2 --withstandby
PS: 如果想修改initdb时的lc_collate, 修改polardb_build.sh.sh脚本. 如果想修改postgresql.conf的一些参数, 也请修改polardb_build.sh.sh
测试
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select version();'
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select * from pg_replication_slots;'
数据库已启动:
[polardb@iZbp15sgewxe2ioglp30z4Z PolarDB-for-PostgreSQL]$ ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 16 polardb 600 56 15
0x0052e6a9 17 polardb 600 56 7
0x0052ea91 18 polardb 600 56 7
0x0052ee79 19 polardb 600 56 6
------ Semaphore Arrays --------
key semid owner perms nsems
目录解释:
cd ~
[polardb@iZbp15sgewxe2ioglp30z4Z ~]$ du -sh *
968M PolarDB-for-PostgreSQL polardb源码
92M tmp_basedir_polardb_pg_1100_bld polardb 二进制软件
840M tmp_datadir_polardb_pg_1100_bld RW RO实例共享数据
24M tmp_primary_dir_polardb_pg_1100_bld RW实例本地内容
23M tmp_replica_dir_polardb_pg_1100_bld1 RO实例本地内容
23M tmp_replica_dir_polardb_pg_1100_bld2 RO实例本地内容
40M tmp_standby_datadir_polardb_pg_1100_bld Standby实例共享数据
23M tmp_standby_dir_polardb_pg_1100_bld Standby实例本地内容
配置环境变量, 方便使用:
vi ~/.bashrc
# 追加
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/data01/polardb/tmp_primary_dir_polardb_pg_1100_bld
export LANG=en_US.utf8
export PGHOME=/data01/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
查询一下:
postgres=# select version();
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 11beta2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
standby1 | | physical | | | f | t | 25125 | | | 0/185DCB0 |
replica2 | | physical | | | f | t | 25112 | | | 0/185DB98 |
replica1 | | physical | | | f | t | 25098 | | | 0/185DB98 |
(3 rows)
为啥是11beta2呢? PolarDB for PG是从beta2开始就进行研发, 后续会逐渐合并上来. 期待尽快合并到PG 14.
了解greenplum的小伙伴一定记得greenplum是怎么一步一步合到9.4的, 中间经历了每一个PG的版本.
压测一下:
pgbench -i -s 100
pgbench -M prepared -n -r -P 1 -c 16 -j 16 -T 120
一些参数, 不解释:
postgres=# select name,setting from pg_settings where name ~ 'polar';
name | setting
--------------------------------------------+------------------------------------------------------------
polar_bg_replay_batch_size | 20000
polar_bgwriter_batch_size_flushlist | 100
polar_bgwriter_max_batch_size | 5000
polar_bgwriter_sleep_lsn_lag | 100
polar_buffer_copy_lsn_lag_with_cons_lsn | 100
polar_buffer_copy_min_modified_count | 5
polar_check_checkpoint_legal_interval | 1
polar_clog_max_local_cache_segments | 128
polar_clog_slot_size | 128
polar_copy_buffers | 16384
polar_datadir | file-dio:///data01/polardb/tmp_datadir_polardb_pg_1100_bld
polar_disk_name | data01
polar_enable_async_ddl_lock_replay | on
polar_enable_parallel_bgwriter | on
polar_enable_redo_logindex | on
polar_fullpage_keep_segments | 16
polar_hostid | 2
polar_logindex_bloom_blocks | 1024
polar_logindex_mem_size | 512
polar_logindex_table_batch_size | 100
polar_logindex_unit_test | 0
polar_max_logindex_files | 80
polar_openfile_with_readonly_in_replica | off
polar_parallel_bgwriter_check_interval | 10
polar_parallel_bgwriter_delay | 10
polar_parallel_bgwriter_enable_dynamic | on
polar_parallel_bgwriter_workers | 5
polar_parallel_new_bgwriter_threshold_lag | 1024
polar_parallel_new_bgwriter_threshold_time | 10
polar_read_ahead_xlog_num | 200
polar_storage_cluster_name |
polar_streaming_xlog_meta | on
polar_version | 1.1.15
polar_vfs.localfs_mode | on
polar_vfs.max_direct_io_size | 1048576
polar_vfs.pfs_force_mount | on
polar_worker.enable_polar_worker | on
polar_worker.polar_worker_check_interval | 5
polar_worker.prealloc_wal_file_num | 2
polar_worker.xlog_temp_outdate_time | -1
polar_xlog_page_buffers | 1024
polar_xlog_queue_buffers | 512
(42 rows)
https://github.com/alibaba/PolarDB-for-PostgreSQL
Deploy Instance based on Local Storage
We provide a script which uses default configuration to compile PolarDB. This section describes how to fast deploy a PolarDB instance with storage on your local disks, by this script. System Requirements: CentOS 7.5 and above. The following setup steps are tested based on CentOS 7.5.
1、Download the source code of PolarDB from https://github.com/alibaba/PolarDB-for-PostgreSQL/tree/main.
2、Install dependent packages:
sudo yum install readline-devel zlib-devel perl-CPAN bison flex
sudo cpan -fi Test::More IPC::Run
3、Select the coommand according to your actual needs to start deployment.
Run the following commands to compile the source code only, without creating any instance:
./polardb_build.sh --noinit
Run the following commands to compile the source code, and create an instance with one node (primary node). The port number is 5432.
./polardb_build.sh
Run the following commands to compile the source code, and create an instance with two nodes. The nodes include:
One primary node and the port number is 5432.
One read-only node and the port number is 5433.
./polardb_build.sh --withrep --repnum=1
Run the following commands to compile the source code, and create an instance with three nodes. The nodes include:
One primary node and the port number is 5432.
One read-only node and the port number is 5433.
One standby node (for standby database) and the port number is 5434.
./polardb_build.sh --withrep --repnum=1 --withstandby
Run the following commands to compile the source code, and create an instance with four nodes. The nodes include:
One primary node and the port number is 5432.
Two read-only nodes and the port numbers are 5433 and 5434.
One standby node (for standby database) and the port number is 5435.
./polardb_build.sh --withrep --repnum=2 --withstandby
4、After the deployment, refer to the followng steps to check and test the created instance to make sure it is deployed correctly.
Run the following commands to check the instance:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select version();'
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select * from pg_replication_slots;'
Run the following commands to perform full regression test:
./polardb_build.sh --withrep --repnum=1 --withstandby -r-check-all -e -r-contrib -r-pl -r-external -r-installcheck-all
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.
9.9元购买3个月阿里云RDS PostgreSQL实例
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





