暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

「YashanDB个人版体验」以运维的角度体验YashanDB容灾技术

原创 kawaa 2023-12-02
1203

Table of Contents

概述

总所周知,在数据库的实际应用中,容灾的建设对于一个企业很重要,这对于数据库保护性和可用性有关键作用。
得知国产数据库YashanDB个人版发布,兴致冲冲地体验一下YashanDB高可用架构的搭建、切换和相关使用。

容灾原理

YashanDB主备的原理是:主库发送日志到从库,备机通过重演主机发送过来的日志来恢复数据页面,以达到和主机的同步。

日志传送有SYNC和ASYNC两种方式,SYNC是直接读取log cache里边的数据发送到备库,性能比较高。而ASYNC是从redo里边读取数据发送到备库,适用于异地备库场景。日志发送采用的模式是由系统自动选择的。

image.png

主备搭建

环境准备

主机IP 主机配置 说明
192.168.56.101 4C 8G 100G 用于部署主库
192.168.56.102 4C 8G 100G 用于部署备库

系统配置

注:主备库均需配置

1. 修改内核参数
echo "vm.swappiness = 0">> /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 32768 60999" >> /etc/sysctl.conf
echo "vm.max_map_count=2000000" >> /etc/sysctl.conf
swapoff -a
sysctl -p

2. 调整使用限制
echo "
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft rss unlimited
* hard rss unlimited
* soft nproc 8192
* hard nproc 8192
* soft core unlimited
* hard core unlimited
" >> /etc/security/limits.conf

3. 关闭透明大页
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

4. 关闭防火墙
systemctl stop firewalld 
systemctl disable firewalld

数据库搭建

1. 创建用户(所有节点)
[root@yashan1-test ~]# useradd yashan
[root@yashan1-test ~]# group add YASDBA
[root@yashan1-test ~]# usermod -a -G YASDBA yashan
[root@yashan1-test ~]# echo "yashan" | passwd --stdin yashan

2. 主节点解压软件包
[root@yashan1-test ~]# su - yashan
[yashan@yashan1-test ~]$ mkdir -p /home/yashan/install
##上传安装包到/home/yashan/install下
[yashan@yashan1-test ~]$ cd /home/yashan/install
[yashan@yashan1-test install]$ tar xzf yashandb-personal-23.1.1.100-linux-x86_64.tar.gz 

3. 生成主备部署文件和安装软件
[yashan@yashan1 install]$ ./bin/yasboot package se gen --cluster yashandb -u yashan -p yashan --ip 192.168.56.101,192.168.56.102 --port 22 --install-path /data/yashan/yasdb_home --data-path /data/yashan/yasdb_data --begin-port 1688 --node 2
 hostid   | group | node_type | node_name | listen_addr         | replication_addr    | data_path
----------------------------------------------------------------------------------------------------------------
 host0001 | dbg1  | db        | 1-1       | 192.168.56.101:1688 | 192.168.56.101:1689 | /data/yashan/yasdb_data
----------+-------+-----------+-----------+---------------------+---------------------+-------------------------
 host0002 | dbg1  | db        | 1-2       | 192.168.56.102:1688 | 192.168.56.102:1689 | /data/yashan/yasdb_data
----------+-------+-----------+-----------+---------------------+---------------------+-------------------------

Generate config success

[yashan@yashan1 install]$ ./bin/yasboot package install -t hosts.toml -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz
checking install package...
install version: yashandb 23.1.1.100
host0001 100% [====================================================================]    3s
host0002 100% [====================================================================]    3s
update host to yasom...

3. 数据库部署和配置环境变量
[yashan@yashan1 install]$ ./bin/yasboot cluster deploy -t yashandb.toml
 type | uuid             | name               | hostid | index    | status  | return_code | progress | cost
------------------------------------------------------------------------------------------------------------
 task | 94c181a52b4d2d23 | DeployYasdbCluster | -      | yashandb | SUCCESS | 0           | 100      | 27
------+------------------+--------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS

##所有节点使用yashan用户运行
[yashan@yashan2 install]$ cat /data/yashan/yasdb_home/yashandb/23.1.1.100/conf/yashandb.bashrc >> ~/.bashrc
[yashan@yashan2 install]$ source ~/.bashrc

4. 设置sys用户口令
[yashan@yashan1 install]$ ./bin/yasboot cluster password set -n yashan666! -c yashandb
 type | uuid             | name             | hostid | index    | status  | return_code | progress | cost
----------------------------------------------------------------------------------------------------------
 task | 9f9af07bb64ecb51 | YasdbPasswordSet | -      | yashandb | SUCCESS | 0           | 100      | 3
------+------------------+------------------+--------+----------+---------+-------------+----------+------
task completed, status: SUCCESS

5. 验证数据库是否安装成功
[yashan@yashan1 install]$ ./bin/yasboot cluster status -c yashandb -d
 hostid   | node_type | nodeid | pid  | instance_status | database_status | database_role | listen_address      | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
 host0001 | db        | 1-1:1  | 4894 | open            | normal          | primary       | 192.168.56.101:1688 | /data/yashan/yasdb_data/db-1-1
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
 host0002 | db        | 1-2:2  | 5606 | open            | normal          | standby       | 192.168.56.102:1688 | /data/yashan/yasdb_data/db-1-2
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------

主备库参数说明

参数 说明
LISTEN_ADDR 本机数据库的IP与监听端口 该参数在安装时已配置
NODE_ID 各节点编号 该参数在创建节点时已配置,且不能重复,主要用于主备机自动选举
REPLICATION_ADDR 本机数据库的IP与同步复制端口 可不配置,则与LISTEN_ADDR共享一个端口
ARCHIVE_DEST_1 本机数据库的IP与监听端口 必须配置
DB_FILE_NAME_CONVERT 对端到本机的数据文件转换路径,支持递归转换 主备数据文件路径不同时必须配置
REDO_FILE_NAME_CONVERT 对端到本机的日志文件转换路径,支持递归转换 主备日志文件路径不同时必须配置
DB_BUCKET_NAME_CONVERT 对端到本机的LSC表的bucket转换路径,支持递归转换 主备bucket路径不同时必须配置
HA_SSL_ENABLE 主备间SSL加密传输开关 content3

主备切换

Switchover切换

主备切换即是主库降级为备库,备库升级为主库。执行主备切换时候要求主备的网络连接正常,并且主备实例都处于OPEN状态。具体的切换过程如下:

image.png

检查主备机同步情况

1 主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;

DEST_ID CONNECTION        STATUS            DATABASE_MODE              RECEIVED_LFN           APPLIED_LFN SYNCHRONIZED
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------
      1 CONNECTED         NORMAL            OPEN                               2189                  2189 YES

1 row fetched.

2 备库查看同步情况
SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag FROM V$REPLICATION_STATUS;

CONNECTION        STATUS            PEER_ADDR                                                        TRANSPORT_LAG    APPLY_LAG
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------
CONNECTED         NORMAL            192.168.56.101:1689                                                          0            0

1 row fetched.

备库执行切换

SQL> ALTER DATABASE SWITCHOVER;

检查新主从状态

1 查看新主库情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;

DEST_ID CONNECTION        STATUS            DATABASE_MODE              RECEIVED_LFN           APPLIED_LFN SYNCHRONIZED
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------
      1 CONNECTED         NORMAL            OPEN                               2205                  2205 YES

1 row fetched.

2 查看新备库同步情况
SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag FROM V$REPLICATION_STATUS;

CONNECTION        STATUS            PEER_ADDR                                                        TRANSPORT_LAG    APPLY_LAG
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------
CONNECTED         NORMAL            192.168.56.102:1689                                                          0            0

1 row fetched.

Failover切换

Failover切换一般是在主库发生故障,将备库强制升级为主库的一种切换方式,该方式会破坏原有的主从关系。具体的切换过程如下:

image.png

选择备机升主

在紧急状态下进行的故障切换可能会丢失数据,为尽可能降低损失,在多台备机中,应该选择日志量最多的一个备机执行Failover。

检查V$REPLICATION_STATUS视图中的received_lfn和gap_seq#字段:

  1. 如果gap_seq#都为0,选择received_lfn值最大的备机。
  2. 如果某台备机gap_seq#不为0,说明该备机缺少一些归档,升主后将丢失一部分数据,此时需要查看所有备机上的该视图,选择gap_seq#值最大,并且received_lfn值最大的备机。
1 确认主备机的角色,为主机或备机
SQL> SELECT database_id,database_name,log_mode,open_mode,database_role FROM V$DATABASE;

          DATABASE_ID DATABASE_NAME                                                    LOG_MODE          OPEN_MODE         DATABASE_ROLE
--------------------- ---------------------------------------------------------------- ----------------- ----------------- -----------------
           3102204658 yashandb                                                         ARCHIVELOG        READ_ONLY         STANDBY

1 row fetched.

2 确认备机连接已经断连,并且RECEIVED_LFN是剩余备机里最大的,GAP_SEQ#为0
SQL> SELECT connection,status,received_lfn,gap_seq# FROM V$REPLICATION_STATUS;


CONNECTION        STATUS                     RECEIVED_LFN     GAP_SEQ#
----------------- ----------------- --------------------- ------------
DISCONNECTED      NORMAL                             2251            0

执行Failover,激活备库

SQL> ALTER DATABASE FAILOVER;

查看新主库状态

SQL> SELECT database_id,database_name,log_mode,open_mode,database_role FROM V$DATABASE;

          DATABASE_ID DATABASE_NAME                                                    LOG_MODE          OPEN_MODE         DATABASE_ROLE
--------------------- ---------------------------------------------------------------- ----------------- ----------------- -----------------
           3102204658 yashandb                                                         ARCHIVELOG        READ_WRITE        PRIMARY

1 row fetched.

日常维护

主备库同步状态日常维护中常用到的视图有:

主库:v$archive_dest_status,这个视图展示了所有备机的统计信息。

从库:v$replication_status和V$RECOVERY_PROGRESS视图。v$replication_status 这个视图显示集群中所有节点的备机redo传输汇总信息。V$RECOVERY_PROGRESS视图 这个视图显示日志回放进度汇总信息。

v$archive_dest_status 视图:
image.png

v$replication_status 视图:
image.png

V$RECOVERY_PROGRESS视图:
image.png

主备同步检查

1 查看数据库状态
SQL> SELECT database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM  V$DATABASE;

DATABASE_NAME                                                    LOG_MODE          OPEN_MODE         PROTECTION_MODE                   DATABASE_ROLE       BLOCK_SIZE STATUS
---------------------------------------------------------------- ----------------- ----------------- --------------------------------- ----------------- ------------ ---------------------------------
yashandb                                                         ARCHIVELOG        READ_WRITE        MAXIMUM AVAILABILITY              PRIMARY                   8192 NORMAL

1 row fetched.

2 查看归档日志情况
SQL> SELECT name, SEQUENCE#, blocks, block_size, completion_time FROM V$ARCHIVED_LOG;

NAME                                                                SEQUENCE#       BLOCKS BLOCK_SIZE COMPLETION_TIME
---------------------------------------------------------------- ------------ ------------ ---------- --------------------------------
/data/yashan/yasdb_data/db-1-1/archive/arch_0_1.ARC                         1            9       4096 2023-12-02

1 row fetched.

3 主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;

DEST_ID CONNECTION        STATUS            DATABASE_MODE              RECEIVED_LFN           APPLIED_LFN SYNCHRONIZED
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------
      1 CONNECTED         NORMAL            OPEN                               2189                  2189 YES

1 row fetched.

4 备库查看同步情况
SQL> SELECT connection,status,peer_addr,transport_lag,apply_lag FROM V$REPLICATION_STATUS;

CONNECTION        STATUS            PEER_ADDR                                                        TRANSPORT_LAG    APPLY_LAG
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------
CONNECTED         NORMAL            192.168.56.101:1689                                                          0            0

1 row fetched.

5 yasboot工具监控

SQL> SELECT * FROM V$RECOVERY_PROGRESS;

ITEM                              UNITS                             VALUE
--------------------------------- ----------------- ---------------------
Active Apply Rate                 KB/sec                          4000000
Average Apply Rate                KB/sec                           117870
Maximum Apply Rate                KB/sec                        232000000
Redo Applied                      MB                                    0
Redo Remain                       MB                                    0
First Applied Redo                LFN                                2168
Last Applied Redo                 LFN                                2189
Active Time                       Seconds                               0
Remain Time                       Seconds                               0
Non Physical Log Count            Number                                3
Parallelism                       Threads                              16
Latest Load Size                  KB                                    4
Latest Load Time                  Millisecond                           0
Latest Analysis Time              Millisecond                           0
Wait Previous Apply Time          Millisecond                           0
Latest Apply Time                 Millisecond                           0

16 rows fetched.

6 yasboot工具监控
[yashan@yashan1 install]$ yasboot cluster status -c yashandb -d
 hostid   | node_type | nodeid | pid  | instance_status | database_status | database_role | listen_address      | data_path
-------------------------------------------------------------------------------------------------------------------------------------------------
 host0001 | db        | 1-1:1  | 4894 | open            | normal          | primary       | 192.168.56.101:1688 | /data/yashan/yasdb_data/db-1-1
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
 host0002 | db        | 1-2:2  | 5606 | open            | normal          | standby       | 192.168.56.102:1688 | /data/yashan/yasdb_data/db-1-2
----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------

保护模式切换

保护模式有三种:最大性能、最大可用、最大保护模式

模式 说明
最大性能模式 主机事务提交无需等待备机复制结果,提交后即向客户端返回事务提交成功。该模式的优点为,主机性能较高,且备机断网或故障不会影响主机业务;缺点为,如果主机故障,备机升主后可能发生数据丢失。
最大可用模式 当任意同步备机连接正常时,采取最大保护模式,当所有同步备机断网或故障时,采取最大性能模式。最大可用模式提供尽可能的数据保护,和主数据库的可用性,但也存在数据丢失风险。
最大保护模式 主机事务提交后,必须等待同步备机将对应日志刷盘后,才能向客户端返回事务提交成功。该模式的优点为,如果主机故障,同步备机升主后可以保证数据不丢失;缺点为,如果同步备机断网或故障,将阻塞主机业务。
## 查看数据库保护模式
SQL> SELECT protection_mode, protection_level FROM V$DATABASE;

PROTECTION_MODE                   PROTECTION_LEVEL
--------------------------------- ---------------------------------
MAXIMUM AVAILABILITY              MAXIMUM AVAILABILITY

1 row fetched.

## 切换成最大性能模式(数据库在OPEN或者MOUNT状态下操作)
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;
SQL> SELECT protection_mode, protection_level FROM V$DATABASE;

PROTECTION_MODE                   PROTECTION_LEVEL
--------------------------------- ---------------------------------
MAXIMUM PERFORMANCE               MAXIMUM PERFORMANCE

1 row fetched.

## 切换成最大保护模式(数据库在OPEN或者MOUNT状态下操作,建议先切换到最大可用,等待redo同步,再切换到最大保护模式)
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PROTECTION;
SQL> SELECT protection_mode, protection_level FROM V$DATABASE;

PROTECTION_MODE                   PROTECTION_LEVEL
--------------------------------- ---------------------------------
MAXIMUM PROTECTION                MAXIMUM PROTECTION

1 row fetched.

GAP修复

模拟环境

1)关闭从库
yasql / as sysdba -c "shutdown immediate;"

-- 主库可看到日志发送失败记录
tail -100f $YASDB_DATA/log/alert/alert.log
2023-12-02 21:13:26.864|22|StandbyDisconnect|0|0||redo sender failed to connect with standby
2023-12-02 21:17:27.446|22|StandbyDisconnect|0|1|
2023-12-02 21:17:57.959|26|Incident|0|0||the incident was controlled by flood, problem id=1, last incident id=5

2) 手动生成归档文件
SQL> alter system switch logfile;
SQL> select name from v$archived_log;
NAME
----------------------------------------------------------------
/data/yashan/yasdb_data/db-1-2/archive/arch_0_2.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_3.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_4.ARC

3)主库新增数据和归档
SQL> create table t1 (id number);
SQL> insert into t1 values (1);
SQL> insert into t1 values (2);
SQL> commit;

4)手动生成归档文件 
SQL> alter system switch logfile;
SQL> alter system switch logfile;
SQL> select name from v$archived_log;
NAME
----------------------------------------------------------------
/data/yashan/yasdb_data/db-1-2/archive/arch_0_2.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_3.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_4.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_5.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_6.ARC
/data/yashan/yasdb_data/db-1-2/archive/arch_0_7.ARC

5)模拟主库归档丢失
mv /data/yashan/yasdb_data/db-1-2/archive/arch_0_6.ARC /data/yashan/yasdb_data/db-1-2/archive/BAK_arch_0_6.ARC

6) 启动从库
yasboot cluster start -c yashandb -d

查看主从同步情况

1)主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS;

DEST_ID CONNECTION        STATUS            DATABASE_MODE              RECEIVED_LFN           APPLIED_LFN SYNCHRONIZED GAP_STATUS
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ -----------------
      1 CONNECTED         NORMAL            OPEN                               2224                  2217 NO           HAS GAP

1 row fetched.

2) 备库查看同步情况
SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag, GAP_SEQ# FROM V$REPLICATION_STATUS;

CONNECTION        STATUS            PEER_ADDR                                                        TRANSPORT_LAG    APPLY_LAG     GAP_SEQ#
----------------- ----------------- ---------------------------------------------------------------- ------------- ------------ ------------
CONNECTED         NORMAL            192.168.56.102:1689                                                          0      1800163            6

1 row fetched.

上面主库查询中可以看到,GAP_STATUS的状态为HAS GAP,表示从库库已经出现了GAP,需要修复。从库查询中可以看到GAP_SEQ#为6,表示从库开始出现GAP的归档SEQUENCE为第6个。

主库执行命令修复GAP

1 查看备机standby_name
SQL> select dest_id, dest_name, service, db_unique_name from V$ARCHIVE_DEST;

DEST_ID DEST_NAME         SERVICE                                                          DB_UNIQUE_NAME
------- ----------------- ---------------------------------------------------------------- ---------------------------------
      1 ARCHIVE_DEST_1    192.168.56.101:1689                                              standby1

1 row fetched.

2 修复备机GAP
SQL> BUILD DATABASE INCREMENTAL TO STANDBY standby1;

主库查看同步情况

SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS;

DEST_ID CONNECTION        STATUS            DATABASE_MODE              RECEIVED_LFN           APPLIED_LFN SYNCHRONIZED GAP_STATUS
------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ -----------------
      1 CONNECTED         NORMAL            OPEN                               2234                  2234 YES          NO GAP

1 row fetched.

个人感受

  1. YashanDB的主备搭建简单,无需手工介入配置主从同步参数,yasboot工具可以直接实现主备架构自动化搭建,避免人工操作出错。
  2. YashanDB的主备切换步骤相较于Oracle简化了不少,简化了运维成本。
  3. YashanDB针对从库GAP的情况在主库上集成了命令可以处理,比较方便。
  4. 整个主备搭建和切换演练过程中,数据库日志记录较少,只记录了主从同步中归档发送失败的日志。
最后修改时间:2023-12-03 12:03:07
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论