暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PolarDB for PostgreSQL 开源版 - 计算存储分离版(类似Oracle RAC架构) 部署指南

原创 digoal 2022-01-20
1605

作者

digoal

日期

2021-09-01

标签

PostgreSQL , Oracle RAC , PolarDB , 共享存储 , 计算存储分离


重大开源消息:

PolarDB PostgreSQL项目已于2021.9.1开源共享存储核心架构相关代码。该架构与Oracle RAC架构类似,支持在一份存储上创建多个实例,实现了一写多读、快速读扩展。 目前已开源代码可以在单机上快速部署。分布式文件系统PolarDB PFS代码也会很快开放,利用PFS可以真正在分布式共享存储上运行PolarDB。

业界首个开源云原生数据库,包含大量阿里云数据库核心自研技术,后续还会推出跨机并行执行等功能,加强分析能力。 原有的分布式版继续在 distributed分支上演进。

持续关注:

https://github.com/alibaba/PolarDB-for-PostgreSQL

centos 7.9 16c64g 2TB ssd

机器部署参考:
《PolarDB 为什么要解决FPW的性能问题?》

yum -y install coreutils glib2 lrzsz dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc gcc-c++ make smartmontools flex bison perl-devel perl-ExtUtils* openldap-devel jadetex  openjade bzip2 git iotop lvm2 perf centos-release-scl     
rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm      
parted -a optimal -s /dev/vdb mklabel gpt mkpart primary 1MiB 100%FREE      
mkfs.ext4 /dev/vdb1 -m 0 -O extent,uninit_bg -b 4096 -L lv01         
vi /etc/fstab                 
LABEL=lv01 /data01 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0              
mkdir /data01              
mount -a          
vi /etc/sysctl.conf      
# add by digoal.zhou              
fs.aio-max-nr = 1048576              
fs.file-max = 76724600              
# 可选:kernel.core_pattern = /data01/corefiles/core_%e_%u_%t_%s.%p                       
# /data01/corefiles 事先建好,权限777,如果是软链接,对应的目录修改为777              
kernel.sem = 4096 2147483647 2147483646 512000                  
# 信号量, ipcs -l 或 -u 查看,每16个进程一组,每组信号量需要17个信号量。              
kernel.shmall = 107374182                    
# 所有共享内存段相加大小限制 (建议内存的80%),单位为页。              
kernel.shmmax = 274877906944                 
# 最大单个共享内存段大小 (建议为内存一半), >9.2的版本已大幅降低共享内存的使用,单位为字节。              
kernel.shmmni = 819200                       
# 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段              
net.core.netdev_max_backlog = 10000              
net.core.rmem_default = 262144                     
# The default setting of the socket receive buffer in bytes.              
net.core.rmem_max = 4194304                        
# The maximum receive socket buffer size in bytes              
net.core.wmem_default = 262144                     
# The default setting (in bytes) of the socket send buffer.              
net.core.wmem_max = 4194304                        
# The maximum send socket buffer size in bytes.              
net.core.somaxconn = 4096              
net.ipv4.tcp_max_syn_backlog = 4096              
net.ipv4.tcp_keepalive_intvl = 20              
net.ipv4.tcp_keepalive_probes = 3              
net.ipv4.tcp_keepalive_time = 60              
net.ipv4.tcp_mem = 8388608 12582912 16777216              
net.ipv4.tcp_fin_timeout = 5              
net.ipv4.tcp_synack_retries = 2              
net.ipv4.tcp_syncookies = 1                  
# 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击              
net.ipv4.tcp_timestamps = 1                  
# 减少time_wait              
net.ipv4.tcp_tw_recycle = 0                  
# 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它              
net.ipv4.tcp_tw_reuse = 1                    
# 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接              
net.ipv4.tcp_max_tw_buckets = 262144              
net.ipv4.tcp_rmem = 8192 87380 16777216              
net.ipv4.tcp_wmem = 8192 65536 16777216              
net.nf_conntrack_max = 1200000              
net.netfilter.nf_conntrack_max = 1200000              
vm.dirty_background_bytes = 409600000                     
#  系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘              
#  默认为10%,大内存机器建议调整为直接指定多少字节              
vm.dirty_expire_centisecs = 3000       
#  比这个值老的脏页,将被刷到磁盘。3000表示30秒。              
vm.dirty_ratio = 95                                        
#  如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。              
#  有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。                
vm.dirty_writeback_centisecs = 100                          
#  pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。              
vm.swappiness = 0              
#  不使用交换分区              
vm.mmap_min_addr = 65536              
vm.overcommit_memory = 0                   
#  在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 .                
vm.overcommit_ratio = 90                   
#  当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。              
vm.swappiness = 0                          
#  关闭交换分区              
vm.zone_reclaim_mode = 0                   
# 禁用 numa, 或者在vmlinux中禁止.               
net.ipv4.ip_local_port_range = 40000 65535                  
# 本地自动分配的TCP, UDP端口号范围              
fs.nr_open=20480000              
# 单个进程允许打开的文件句柄上限              
# 以下参数请注意              
# vm.extra_free_kbytes = 4096000              
# vm.min_free_kbytes = 2097152  # vm.min_free_kbytes 建议每32G内存分配1G vm.min_free_kbytes             
# 如果是小内存机器,以上两个值不建议设置              
# vm.nr_hugepages = 66536                  
#  建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize              
# vm.lowmem_reserve_ratio = 1 1 1              
# 对于内存大于64G时,建议设置,否则建议默认值 256 256 32       
sysctl -p      
vi /etc/security/limits.d/20-nproc.conf        
# nofile超过1048576的话,一定要先将sysctl的fs.nr_open设置为更大的值,并生效后才能继续设置nofile.              
* soft    nofile  1024000              
* hard    nofile  1024000              
* soft    nproc   unlimited              
* hard    nproc   unlimited              
* soft    core    unlimited              
* hard    core    unlimited              
* soft    memlock unlimited              
* hard    memlock unlimited        
chmod +x /etc/rc.d/rc.local      
vi /etc/rc.local          
touch /var/lock/subsys/local          
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then              
   echo never > /sys/kernel/mm/transparent_hugepage/enabled              
fi          
yum install -y readline-devel zlib-devel perl-CPAN bison flex git     
cpan -fi Test::More IPC::Run    
useradd polardb -d /data01/polardb    
su - polardb    
cd ~    
git clone https://github.com/alibaba/PolarDB-for-PostgreSQL.git    
./polardb_build.sh --withrep --repnum=2 --withstandby    

PS: 如果想修改initdb时的lc_collate, 修改polardb_build.sh.sh脚本. 如果想修改postgresql.conf的一些参数, 也请修改polardb_build.sh.sh

测试

$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select version();'    
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select * from pg_replication_slots;'    

数据库已启动:

[polardb@iZbp15sgewxe2ioglp30z4Z PolarDB-for-PostgreSQL]$ ipcs  
------ Message Queues --------  
key        msqid      owner      perms      used-bytes   messages      
------ Shared Memory Segments --------  
key        shmid      owner      perms      bytes      nattch     status        
0x0052e2c1 16         polardb    600        56         15                        
0x0052e6a9 17         polardb    600        56         7                         
0x0052ea91 18         polardb    600        56         7                         
0x0052ee79 19         polardb    600        56         6                         
------ Semaphore Arrays --------  
key        semid      owner      perms      nsems       

目录解释:

cd ~  
[polardb@iZbp15sgewxe2ioglp30z4Z ~]$ du -sh *  
968M    PolarDB-for-PostgreSQL            polardb源码  
92M tmp_basedir_polardb_pg_1100_bld       polardb 二进制软件  
840M    tmp_datadir_polardb_pg_1100_bld   RW RO实例共享数据  
24M tmp_primary_dir_polardb_pg_1100_bld   RW实例本地内容  
23M tmp_replica_dir_polardb_pg_1100_bld1  RO实例本地内容  
23M tmp_replica_dir_polardb_pg_1100_bld2  RO实例本地内容  
40M tmp_standby_datadir_polardb_pg_1100_bld   Standby实例共享数据  
23M tmp_standby_dir_polardb_pg_1100_bld       Standby实例本地内容  

配置环境变量, 方便使用:

vi ~/.bashrc  
# 追加          
export PS1="$USER@`/bin/hostname -s`-> "            
export PGPORT=5432            
export PGDATA=/data01/polardb/tmp_primary_dir_polardb_pg_1100_bld        
export LANG=en_US.utf8            
export PGHOME=/data01/polardb/tmp_basedir_polardb_pg_1100_bld  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH            
export DATE=`date +"%Y%m%d%H%M"`            
export PATH=$PGHOME/bin:$PATH:.            
export MANPATH=$PGHOME/share/man:$MANPATH            
export PGHOST=127.0.0.1         
export PGUSER=postgres            
export PGDATABASE=postgres            
alias rm='rm -i'            
alias ll='ls -lh'            
unalias vi   

查询一下:

postgres=# select version();  
                                                  version                                                     
------------------------------------------------------------------------------------------------------------  
 PostgreSQL 11beta2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit  
(1 row)  
postgres=# select * from pg_replication_slots;  
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn   
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------  
 standby1  |        | physical  |        |          | f         | t      |      25125 |      |              | 0/185DCB0   |   
 replica2  |        | physical  |        |          | f         | t      |      25112 |      |              | 0/185DB98   |   
 replica1  |        | physical  |        |          | f         | t      |      25098 |      |              | 0/185DB98   |   
(3 rows)  

为啥是11beta2呢? PolarDB for PG是从beta2开始就进行研发, 后续会逐渐合并上来. 期待尽快合并到PG 14.

了解greenplum的小伙伴一定记得greenplum是怎么一步一步合到9.4的, 中间经历了每一个PG的版本.

压测一下:

pgbench -i -s 100  
pgbench -M prepared -n -r -P 1 -c 16 -j 16 -T 120   

一些参数, 不解释:

postgres=# select name,setting from pg_settings where name ~ 'polar';  
                    name                    |                          setting                             
--------------------------------------------+------------------------------------------------------------  
 polar_bg_replay_batch_size                 | 20000  
 polar_bgwriter_batch_size_flushlist        | 100  
 polar_bgwriter_max_batch_size              | 5000  
 polar_bgwriter_sleep_lsn_lag               | 100  
 polar_buffer_copy_lsn_lag_with_cons_lsn    | 100  
 polar_buffer_copy_min_modified_count       | 5  
 polar_check_checkpoint_legal_interval      | 1  
 polar_clog_max_local_cache_segments        | 128  
 polar_clog_slot_size                       | 128  
 polar_copy_buffers                         | 16384  
 polar_datadir                              | file-dio:///data01/polardb/tmp_datadir_polardb_pg_1100_bld  
 polar_disk_name                            | data01  
 polar_enable_async_ddl_lock_replay         | on  
 polar_enable_parallel_bgwriter             | on  
 polar_enable_redo_logindex                 | on  
 polar_fullpage_keep_segments               | 16  
 polar_hostid                               | 2  
 polar_logindex_bloom_blocks                | 1024  
 polar_logindex_mem_size                    | 512  
 polar_logindex_table_batch_size            | 100  
 polar_logindex_unit_test                   | 0  
 polar_max_logindex_files                   | 80  
 polar_openfile_with_readonly_in_replica    | off  
 polar_parallel_bgwriter_check_interval     | 10  
 polar_parallel_bgwriter_delay              | 10  
 polar_parallel_bgwriter_enable_dynamic     | on  
 polar_parallel_bgwriter_workers            | 5  
 polar_parallel_new_bgwriter_threshold_lag  | 1024  
 polar_parallel_new_bgwriter_threshold_time | 10  
 polar_read_ahead_xlog_num                  | 200  
 polar_storage_cluster_name                 |   
 polar_streaming_xlog_meta                  | on  
 polar_version                              | 1.1.15  
 polar_vfs.localfs_mode                     | on  
 polar_vfs.max_direct_io_size               | 1048576  
 polar_vfs.pfs_force_mount                  | on  
 polar_worker.enable_polar_worker           | on  
 polar_worker.polar_worker_check_interval   | 5  
 polar_worker.prealloc_wal_file_num         | 2  
 polar_worker.xlog_temp_outdate_time        | -1  
 polar_xlog_page_buffers                    | 1024  
 polar_xlog_queue_buffers                   | 512  
(42 rows)  

https://github.com/alibaba/PolarDB-for-PostgreSQL

Deploy Instance based on Local Storage
We provide a script which uses default configuration to compile PolarDB. This section describes how to fast deploy a PolarDB instance with storage on your local disks, by this script. System Requirements: CentOS 7.5 and above. The following setup steps are tested based on CentOS 7.5.

1、Download the source code of PolarDB from https://github.com/alibaba/PolarDB-for-PostgreSQL/tree/main.

2、Install dependent packages:

sudo yum install readline-devel zlib-devel perl-CPAN bison flex    
sudo cpan -fi Test::More IPC::Run    

3、Select the coommand according to your actual needs to start deployment.

Run the following commands to compile the source code only, without creating any instance:

./polardb_build.sh --noinit    

Run the following commands to compile the source code, and create an instance with one node (primary node). The port number is 5432.

./polardb_build.sh    

Run the following commands to compile the source code, and create an instance with two nodes. The nodes include:
One primary node and the port number is 5432.
One read-only node and the port number is 5433.

./polardb_build.sh --withrep --repnum=1    

Run the following commands to compile the source code, and create an instance with three nodes. The nodes include:
One primary node and the port number is 5432.
One read-only node and the port number is 5433.
One standby node (for standby database) and the port number is 5434.

./polardb_build.sh --withrep --repnum=1 --withstandby    

Run the following commands to compile the source code, and create an instance with four nodes. The nodes include:
One primary node and the port number is 5432.
Two read-only nodes and the port numbers are 5433 and 5434.
One standby node (for standby database) and the port number is 5435.

./polardb_build.sh --withrep --repnum=2 --withstandby    

4、After the deployment, refer to the followng steps to check and test the created instance to make sure it is deployed correctly.
Run the following commands to check the instance:

$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select version();'    
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select * from pg_replication_slots;'    

Run the following commands to perform full regression test:

./polardb_build.sh --withrep --repnum=1 --withstandby -r-check-all -e -r-contrib -r-pl -r-external -r-installcheck-all    

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论