Table of Contents
概述
总所周知,在数据库的实际应用中,容灾的建设对于一个企业很重要,这对于数据库保护性和可用性有关键作用。
得知国产数据库YashanDB个人版发布,兴致冲冲地体验一下YashanDB高可用架构的搭建、切换和相关使用。
容灾原理
YashanDB主备的原理是:主库发送日志到从库,备机通过重演主机发送过来的日志来恢复数据页面,以达到和主机的同步。
日志传送有SYNC和ASYNC两种方式,SYNC是直接读取log cache里边的数据发送到备库,性能比较高。而ASYNC是从redo里边读取数据发送到备库,适用于异地备库场景。日志发送采用的模式是由系统自动选择的。
主备搭建
环境准备
主机IP | 主机配置 | 说明 |
---|---|---|
192.168.56.101 | 4C 8G 100G | 用于部署主库 |
192.168.56.102 | 4C 8G 100G | 用于部署备库 |
系统配置
注:主备库均需配置
1. 修改内核参数
echo "vm.swappiness = 0">> /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 32768 60999" >> /etc/sysctl.conf
echo "vm.max_map_count=2000000" >> /etc/sysctl.conf
swapoff -a
sysctl -p
2. 调整使用限制
echo "
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft rss unlimited
* hard rss unlimited
* soft nproc 8192
* hard nproc 8192
* soft core unlimited
* hard core unlimited
" >> /etc/security/limits.conf
3. 关闭透明大页
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
4. 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
数据库搭建
1. 创建用户(所有节点)
[root@yashan1-test ~]# useradd yashan
[root@yashan1-test ~]# group add YASDBA
[root@yashan1-test ~]# usermod -a -G YASDBA yashan
[root@yashan1-test ~]# echo "yashan" | passwd --stdin yashan
2. 主节点解压软件包
[root@yashan1-test ~]# su - yashan
[yashan@yashan1-test ~]$ mkdir -p /home/yashan/install
##上传安装包到/home/yashan/install下
[yashan@yashan1-test ~]$ cd /home/yashan/install
[yashan@yashan1-test install]$ tar xzf yashandb-personal-23.1.1.100-linux-x86_64.tar.gz
3. 生成主备部署文件和安装软件
[yashan@yashan1 install]$ ./bin/yasboot package se gen --cluster yashandb -u yashan -p yashan --ip 192.168.56.101,192.168.56.102 --port 22 --install-path /data/yashan/yasdb_home --data-path /data/yashan/yasdb_data --begin-port 1688 --node 2
hostid | group | node_type | node_name | listen_addr | replication_addr | data_path
----------------------------------------------------------------------------------------------------------------
host0001 | dbg1 | db | 1-1 | 192.168.56.101:1688 | 192.168.56.101:1689 | /data/yashan/yasdb_data
----------+-------+-----------+-----------+---------------------+---------------------+-------------------------
host0002 | dbg1 | db | 1-2 | 192.168.56.102:1688 | 192.168.56.102:1689 | /data/yashan/yasdb_data
----------+-------+-----------+-----------+---------------------+---------------------+-------------------------
Generate config success
[yashan@yashan1 install]$ ./bin/yasboot package install -t hosts.toml -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz
checking install package...
install version: yashandb 23.1.1.100
host0001 100% [====================================================================] 3s
host0002 100% [====================================================================] 3s
update host to yasom...
3. 数据库部署和配置环境变量
[yashan@yashan1 install]$ ./bin/yasboot cluster deploy -t yashandb.toml
type | uuid | name | hostid | index | status | return_code | progress | cost
------------------------------------------------------------------------------------------------------------
task | 94c181a52b4d2d23 | DeployYasdbCluster | - | yashandb | SUCCESS | 0 | 100 | 27
------+------------------+--------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
##所有节点使用yashan用户运行
[yashan@yashan2 install]$ cat /data/yashan/yasdb_home/yashandb/23.1.1.100/conf/yashandb.bashrc >> ~/.bashrc
[yashan@yashan2 install]$ source ~/.bashrc
4. 设置sys用户口令
[yashan@yashan1 install]$ ./bin/yasboot cluster password set -n yashan666! -c yashandb
type | uuid | name | hostid | index | status | return_code | progress | cost
----------------------------------------------------------------------------------------------------------
task | 9f9af07bb64ecb51 | YasdbPasswordSet | - | yashandb | SUCCESS | 0 | 100 | 3
------+------------------+------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS
5. 验证数据库是否安装成功
[yashan@yashan1 install]$ ./bin/yasboot cluster status -c yashandb -d
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 4894 | open | normal | primary | 192.168.56.101:1688 | /data/yashan/yasdb_data/db-1-1
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
host0002 | db | 1-2:2 | 5606 | open | normal | standby | 192.168.56.102:1688 | /data/yashan/yasdb_data/db-1-2
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
主备库参数说明
参数 | 值 | 说明 |
---|---|---|
LISTEN_ADDR | 本机数据库的IP与监听端口 | 该参数在安装时已配置 |
NODE_ID | 各节点编号 | 该参数在创建节点时已配置,且不能重复,主要用于主备机自动选举 |
REPLICATION_ADDR | 本机数据库的IP与同步复制端口 | 可不配置,则与LISTEN_ADDR共享一个端口 |
ARCHIVE_DEST_1 | 本机数据库的IP与监听端口 | 必须配置 |
DB_FILE_NAME_CONVERT | 对端到本机的数据文件转换路径,支持递归转换 | 主备数据文件路径不同时必须配置 |
REDO_FILE_NAME_CONVERT | 对端到本机的日志文件转换路径,支持递归转换 | 主备日志文件路径不同时必须配置 |
DB_BUCKET_NAME_CONVERT | 对端到本机的LSC表的bucket转换路径,支持递归转换 | 主备bucket路径不同时必须配置 |
HA_SSL_ENABLE | 主备间SSL加密传输开关 | content3 |
主备切换
Switchover切换
主备切换即是主库降级为备库,备库升级为主库。执行主备切换时候要求主备的网络连接正常,并且主备实例都处于OPEN状态。具体的切换过程如下:
检查主备机同步情况
1 主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------
1 CONNECTED NORMAL OPEN 2189 2189 YES
1 row fetched.
2 备库查看同步情况
SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag FROM V$REPLICATION_STATUS;
CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------
CONNECTED NORMAL 192.168.56.101:1689 0 0
1 row fetched.
备库执行切换
SQL> ALTER DATABASE SWITCHOVER;
检查新主从状态
1 查看新主库情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------
1 CONNECTED NORMAL OPEN 2205 2205 YES
1 row fetched.
2 查看新备库同步情况
SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag FROM V$REPLICATION_STATUS;
CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------
CONNECTED NORMAL 192.168.56.102:1689 0 0
1 row fetched.
Failover切换
Failover切换一般是在主库发生故障,将备库强制升级为主库的一种切换方式,该方式会破坏原有的主从关系。具体的切换过程如下:
选择备机升主
在紧急状态下进行的故障切换可能会丢失数据,为尽可能降低损失,在多台备机中,应该选择日志量最多的一个备机执行Failover。
检查V$REPLICATION_STATUS视图中的received_lfn和gap_seq#字段:
- 如果gap_seq#都为0,选择received_lfn值最大的备机。
- 如果某台备机gap_seq#不为0,说明该备机缺少一些归档,升主后将丢失一部分数据,此时需要查看所有备机上的该视图,选择gap_seq#值最大,并且received_lfn值最大的备机。
1 确认主备机的角色,为主机或备机
SQL> SELECT database_id,database_name,log_mode,open_mode,database_role FROM V$DATABASE;
DATABASE_ID DATABASE_NAME LOG_MODE OPEN_MODE DATABASE_ROLE
--------------------- ---------------------------------------------------------------- ----------------- ----------------- -----------------
3102204658 yashandb ARCHIVELOG READ_ONLY STANDBY
1 row fetched.
2 确认备机连接已经断连,并且RECEIVED_LFN是剩余备机里最大的,GAP_SEQ#为0
SQL> SELECT connection,status,received_lfn,gap_seq# FROM V$REPLICATION_STATUS;
CONNECTION STATUS RECEIVED_LFN GAP_SEQ#
----------------- ----------------- --------------------- ------------
DISCONNECTED NORMAL 2251 0
执行Failover,激活备库
SQL> ALTER DATABASE FAILOVER;
查看新主库状态
SQL> SELECT database_id,database_name,log_mode,open_mode,database_role FROM V$DATABASE;
DATABASE_ID DATABASE_NAME LOG_MODE OPEN_MODE DATABASE_ROLE
--------------------- ---------------------------------------------------------------- ----------------- ----------------- -----------------
3102204658 yashandb ARCHIVELOG READ_WRITE PRIMARY
1 row fetched.
日常维护
主备库同步状态日常维护中常用到的视图有:
主库:v$archive_dest_status,这个视图展示了所有备机的统计信息。
从库:v$replication_status和V$RECOVERY_PROGRESS视图。v$replication_status 这个视图显示集群中所有节点的备机redo传输汇总信息。V$RECOVERY_PROGRESS视图 这个视图显示日志回放进度汇总信息。
v$archive_dest_status 视图:
v$replication_status 视图:
V$RECOVERY_PROGRESS视图:
主备同步检查
1 查看数据库状态
SQL> SELECT database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM V$DATABASE;
DATABASE_NAME LOG_MODE OPEN_MODE PROTECTION_MODE DATABASE_ROLE BLOCK_SIZE STATUS
---------------------------------------------------------------- ----------------- ----------------- --------------------------------- ----------------- ------------ ---------------------------------
yashandb ARCHIVELOG READ_WRITE MAXIMUM AVAILABILITY PRIMARY 8192 NORMAL
1 row fetched.
2 查看归档日志情况
SQL> SELECT name, SEQUENCE#, blocks, block_size, completion_time FROM V$ARCHIVED_LOG;
NAME SEQUENCE# BLOCKS BLOCK_SIZE COMPLETION_TIME
---------------------------------------------------------------- ------------ ------------ ---------- --------------------------------
/data/yashan/yasdb_data/db-1-1/archive/arch_0_1.ARC 1 9 4096 2023-12-02
1 row fetched.
3 主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------
1 CONNECTED NORMAL OPEN 2189 2189 YES
1 row fetched.
4 备库查看同步情况
SQL> SELECT connection,status,peer_addr,transport_lag,apply_lag FROM V$REPLICATION_STATUS;
CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------
CONNECTED NORMAL 192.168.56.101:1689 0 0
1 row fetched.
5 yasboot工具监控
SQL> SELECT * FROM V$RECOVERY_PROGRESS;
ITEM UNITS VALUE
--------------------------------- ----------------- ---------------------
Active Apply Rate KB/sec 4000000
Average Apply Rate KB/sec 117870
Maximum Apply Rate KB/sec 232000000
Redo Applied MB 0
Redo Remain MB 0
First Applied Redo LFN 2168
Last Applied Redo LFN 2189
Active Time Seconds 0
Remain Time Seconds 0
Non Physical Log Count Number 3
Parallelism Threads 16
Latest Load Size KB 4
Latest Load Time Millisecond 0
Latest Analysis Time Millisecond 0
Wait Previous Apply Time Millisecond 0
Latest Apply Time Millisecond 0
16 rows fetched.
6 yasboot工具监控
[yashan@yashan1 install]$ yasboot cluster status -c yashandb -d
hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
host0001 | db | 1-1:1 | 4894 | open | normal | primary | 192.168.56.101:1688 | /data/yashan/yasdb_data/db-1-1
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
host0002 | db | 1-2:2 | 5606 | open | normal | standby | 192.168.56.102:1688 | /data/yashan/yasdb_data/db-1-2
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
保护模式切换
保护模式有三种:最大性能、最大可用、最大保护模式
模式 | 说明 |
---|---|
最大性能模式 | 主机事务提交无需等待备机复制结果,提交后即向客户端返回事务提交成功。该模式的优点为,主机性能较高,且备机断网或故障不会影响主机业务;缺点为,如果主机故障,备机升主后可能发生数据丢失。 |
最大可用模式 | 当任意同步备机连接正常时,采取最大保护模式,当所有同步备机断网或故障时,采取最大性能模式。最大可用模式提供尽可能的数据保护,和主数据库的可用性,但也存在数据丢失风险。 |
最大保护模式 | 主机事务提交后,必须等待同步备机将对应日志刷盘后,才能向客户端返回事务提交成功。该模式的优点为,如果主机故障,同步备机升主后可以保证数据不丢失;缺点为,如果同步备机断网或故障,将阻塞主机业务。 |
## 查看数据库保护模式
SQL> SELECT protection_mode, protection_level FROM V$DATABASE;
PROTECTION_MODE PROTECTION_LEVEL
--------------------------------- ---------------------------------
MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
1 row fetched.
## 切换成最大性能模式(数据库在OPEN或者MOUNT状态下操作)
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;
SQL> SELECT protection_mode, protection_level FROM V$DATABASE;
PROTECTION_MODE PROTECTION_LEVEL
--------------------------------- ---------------------------------
MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
1 row fetched.
## 切换成最大保护模式(数据库在OPEN或者MOUNT状态下操作,建议先切换到最大可用,等待redo同步,再切换到最大保护模式)
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PROTECTION;
SQL> SELECT protection_mode, protection_level FROM V$DATABASE;
PROTECTION_MODE PROTECTION_LEVEL
--------------------------------- ---------------------------------
MAXIMUM PROTECTION MAXIMUM PROTECTION
1 row fetched.
GAP修复
模拟环境
1)关闭从库
yasql / as sysdba -c "shutdown immediate;"
-- 主库可看到日志发送失败记录
tail -100f $YASDB_DATA/log/alert/alert.log
2023-12-02 21:13:26.864|22|StandbyDisconnect|0|0||redo sender failed to connect with standby
2023-12-02 21:17:27.446|22|StandbyDisconnect|0|1|
2023-12-02 21:17:57.959|26|Incident|0|0||the incident was controlled by flood, problem id=1, last incident id=5
2) 手动生成归档文件
SQL> alter system switch logfile;
SQL> select name from v$archived_log;
NAME
----------------------------------------------------------------
/data/yashan/yasdb_data/db-1-2/archive/arch_0_2.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_3.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_4.ARC
3)主库新增数据和归档
SQL> create table t1 (id number);
SQL> insert into t1 values (1);
SQL> insert into t1 values (2);
SQL> commit;
4)手动生成归档文件
SQL> alter system switch logfile;
SQL> alter system switch logfile;
SQL> select name from v$archived_log;
NAME
----------------------------------------------------------------
/data/yashan/yasdb_data/db-1-2/archive/arch_0_2.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_3.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_4.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_5.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_6.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_7.ARC
5)模拟主库归档丢失
mv /data/yashan/yasdb_data/db-1-2/archive/arch_0_6.ARC /data/yashan/yasdb_data/db-1-2/archive/BAK_arch_0_6.ARC
6) 启动从库
yasboot cluster start -c yashandb -d
查看主从同步情况
1)主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED GAP_STATUS
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ -----------------
1 CONNECTED NORMAL OPEN 2224 2217 NO HAS GAP
1 row fetched.
2) 备库查看同步情况
SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag, GAP_SEQ# FROM V$REPLICATION_STATUS;
CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG GAP_SEQ#
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------ ------------
CONNECTED NORMAL 192.168.56.102:1689 0 1800163 6
1 row fetched.
上面主库查询中可以看到,GAP_STATUS的状态为HAS GAP,表示从库库已经出现了GAP,需要修复。从库查询中可以看到GAP_SEQ#为6,表示从库开始出现GAP的归档SEQUENCE为第6个。
主库执行命令修复GAP
1 查看备机standby_name
SQL> select dest_id, dest_name, service, db_unique_name from V$ARCHIVE_DEST;
DEST_ID DEST_NAME SERVICE DB_UNIQUE_NAME
------- ----------------- ---------------------------------------------------------------- ---------------------------------
1 ARCHIVE_DEST_1 192.168.56.101:1689 standby1
1 row fetched.
2 修复备机GAP
SQL> BUILD DATABASE INCREMENTAL TO STANDBY standby1;
主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED GAP_STATUS
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ -----------------
1 CONNECTED NORMAL OPEN 2234 2234 YES NO GAP
1 row fetched.
个人感受
- YashanDB的主备搭建简单,无需手工介入配置主从同步参数,yasboot工具可以直接实现主备架构自动化搭建,避免人工操作出错。
- YashanDB的主备切换步骤相较于Oracle简化了不少,简化了运维成本。
- YashanDB针对从库GAP的情况在主库上集成了命令可以处理,比较方便。
- 整个主备搭建和切换演练过程中,数据库日志记录较少,只记录了主从同步中归档发送失败的日志。