作者:张乾
外星人2号,现兼任六位喵星人的资深铲屎官。
本文来源:原创投稿
* 爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。
原理:
server_permanent_offline_time 是 OceanBase 数据库中用于控制节点永久下线时长的参数。当集群中的某个节点宕机后,系统会根据该参数的设置值来进行相应操作。
官方建议
1. OceanBase 数据库版本升级场景:建议将该配置项的值设置为72h。
准备过程
预备一套环境
注:本文基于OB 3.1.2版本,其他版本需注意另作验证。
| 版本 | ip | |
|---|---|---|
| oceanbase | 3.1.2 | 10.186.64.74 |
| 10.186.64.75 | ||
| 10.186.64.79 | ||
| OBProxy | 3.2.3 | 10.186.60.3 |
准备些数据
使用 sysbench 创建一个表 sbtest1 并插入1W数据。
sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant" --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=600 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 prepare
这里改写了 sysbench 的建表语句,分了3个区,查询 sbtest1 表分区副本分布如下
MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta inner join __all_tenant tenant on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by tenant.tenant_name,zone, svr_ip,svr_port, 5 order by tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name | zone | svr_ip | svr_port | role | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | leader | 1 |
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | follower | 2 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | leader | 1 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | follower | 2 |
| sysbench_tenant | zone3 | 10.186.64.79 | 2882 | leader | 1 |
| sysbench_tenant | zone3 | 10.186.64.79 | 2882 | follower | 2 |
+-----------------+-------+--------------+----------+----------+---------------+
开始实验
使用 sysbench 持续写入数据,维持一定的流量,便于在节点重建后对比各节点数据是否一致。
sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant" --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=300 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 run
删除某节点的数据文件
选择 zone3 下的 10.186.64.79 节点,将数据文件删除。
[root@localhost data]# rm -rf 1/sstable/block_file
[root@localhost data]# cd 1/sstable/
[root@localhost sstable]# ll
total 0
永久下线故障节点
server_permanent_offline_time 默认值为 3600s
MySQL [oceanbase]> alter system set server_permanent_offline_time='60s';
Query OK, 0 rows affected (0.030 sec)
MySQL [oceanbase]> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone3 | observer | 10.186.64.79 | 2882 | server_permanent_offline_time | NULL | 60s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞) | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone1 | observer | 10.186.64.74 | 2882 | server_permanent_offline_time | NULL | 60s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞) | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone2 | observer | 10.186.64.75 | 2882 | server_permanent_offline_time | NULL | 60s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞) | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
在 kill ob 进程前,建议使用隔离(ISOLATE SERVER)或者停止(STOP SERVER)节点的命令,停掉发往该节点的请求,转移副本 leader 角色。在节点重建恢复后,再开启流量。
# 停掉79节点服务
MySQL [oceanbase]> ALTER SYSTEM STOP SERVER '10.186.64.79:2882' ZONE='zone3';
# 或者隔离
ALTER SYSTEM ISOLATE SERVER '10.186.64.79:2882' ZONE='zone3';
执行 kill -9 $observer_pid ,等待 server_permanent_offline_time 的时间,该ob进入"永久下线”状态。判断ob是否已经永久下线,可以查询表 __all_rootservice_event_history,存在名为 "permanent_offline "的event记录,确认时间和ip都一致后,即可认为ob已经永久下线。
MySQL [oceanbase]> select * from __all_rootservice_event_history where event='permanent_offline' ;
+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+
| gmt_create | module | event | name1 | value1 | name2 | value2 | name3 | value3 | name4 | value4 | name5 | value5 | name6 | value6 | extra_info | rs_svr_ip | rs_svr_port |
+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+
| 2023-03-29 17:34:09.596035 | server | permanent_offline | server | "10.186.64.79:2882" | | | | | | | | | | | | 10.186.64.74 | 2882 |
+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+
zone2 下的75节点有一个从副本升级为leader角色,此时集群仍然可以继续对外服务。
MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta inner join __all_tenant tenant on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by tenant.tenant_name,zone, svr_ip,svr_port, 5 order by tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name | zone | svr_ip | svr_port | role | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | leader | 1 |
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | follower | 2 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | leader | 2 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | follower | 1 |
+-----------------+-------+--------------+----------+----------+---------------+
4 rows in set (0.005 sec)
拉起故障节点,触发数据自动重建
注:防止ob启动失败或存在其他问题,建议启动前将数据文件和事务日志均清空。
[root@localhost data]# rm -rf log1/clog/*
[root@localhost data]# rm -rf log1/ilog/*
[root@localhost data]# rm -rf log1/slog/*
[root@localhost data]# rm -rf 1/sstable/block_file
[root@localhost data]# cd 1/sstable/
[root@localhost sstable]# ll
total 0
[root@localhost sstable]# su admin
bash-4.2$ cd home/admin/ && ./bin/observer
./bin/observer
进程启动后,确认ob心跳恢复状态为active,然后查看分区正在不断补足中
MySQL [oceanbase]> select svr_ip,zone,with_rootserver,status,stop_time,start_service_time,build_version from __all_server;
+--------------+-------+-----------------+--------+------------------+--------------------+----------------------------------------------------------------------------------------+
| svr_ip | zone | with_rootserver | status | stop_time | start_service_time | build_version |
+--------------+-------+-----------------+--------+-----------+---------------------------+----------------------------------------------------------------------------------------+
| 10.186.64.74 | zone1 | 1 | active | 0 | 1679984798650860 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |
| 10.186.64.75 | zone2 | 0 | active | 0 | 1679984801289281 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |
| 10.186.64.79 | zone3 | 0 | active | 1680082329964975 | 1680082511964975 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |
+--------------+-------+-----------------+--------+------------------+--------------------+----------------------------------------------------------------------------------------+
3 rows in set (0.002 sec)
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone |
+----------+-------+
| 1322 | zone1 |
| 1322 | zone2 |
| 152 | zone3 |
+----------+-------+
3 rows in set (0.228 sec)
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone |
+----------+-------+
| 1322 | zone1 |
| 1322 | zone2 |
| 664 | zone3 |
+----------+-------+
3 rows in set (0.113 sec)
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone |
+----------+-------+
| 1322 | zone1 |
| 1322 | zone2 |
| 1179 | zone3 |
+----------+-------+
3 rows in set (0.112 sec)
MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;
+----------+-------+
| count(*) | zone |
+----------+-------+
| 1322 | zone1 |
| 1322 | zone2 |
| 1322 | zone3 |
+----------+-------+
3 rows in set (0.116 sec)
由于79节点处于隔离状态,所以还没有leader副本。
MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta inner join __all_tenant tenant on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by tenant.tenant_name,zone, svr_ip,svr_port, 5 order by tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name | zone | svr_ip | svr_port | role | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | leader | 1 |
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | follower | 2 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | leader | 2 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | follower | 1 |
| sysbench_tenant | zone3 | 10.186.64.79 | 2882 | follower | 3 |
+-----------------+-------+--------------+----------+----------+---------------+
6 rows in set (0.005 sec)
执行命令解除79节点的隔离状态。
ALTER SYSTEM START SERVER '10.186.64.79:2882' ZONE='zone3';
查询分区副本分布如下,leader角色已迁回79节点。
MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta inner join __all_tenant tenant on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by tenant.tenant_name,zone, svr_ip,svr_port, 5 order by tenant.tenant_name, zone, svr_ip, role desc;
+-----------------+-------+--------------+----------+----------+---------------+
| tenant_name | zone | svr_ip | svr_port | role | partition_cnt |
+-----------------+-------+--------------+----------+----------+---------------+
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | leader | 1 |
| sysbench_tenant | zone1 | 10.186.64.74 | 2882 | follower | 2 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | leader | 1 |
| sysbench_tenant | zone2 | 10.186.64.75 | 2882 | follower | 2 |
| sysbench_tenant | zone3 | 10.186.64.79 | 2882 | leader | 1 |
| sysbench_tenant | zone3 | 10.186.64.79 | 2882 | follower | 2 |
+-----------------+-------+--------------+----------+----------+---------------+
3. 把 server_permanent_offline_time 参数恢复为默认值3600s
MySQL [oceanbase]> alter system set server_permanent_offline_time='3600s';
Query OK, 0 rows affected (0.028 sec)
MySQL [oceanbase]> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
| zone2 | observer | 10.186.64.75 | 2882 | server_permanent_offline_time | NULL | 3600s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞) | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone1 | observer | 10.186.64.74 | 2882 | server_permanent_offline_time | NULL | 3600s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞) | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| zone3 | observer | 10.186.64.79 | 2882 | server_permanent_offline_time | NULL | 3600s | the time interval between any two heartbeats beyond which a server is considered to be \'permanently\' offline. Range: [20s,+∞) | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+
3 rows in set (0.007 sec)
校验各ob节点数据量
sysbench 已运行结束,直连各 observer ,校验数据量是一致的。
[root@localhost ~]# obclient -h10.186.64.74 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench
Welcome to the OceanBase. Commands end with ; or \g.
Your MySQL connection id is 3221545401
Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [sysbenchdb]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 53195 |
+----------+
1 row in set (0.036 sec)
MySQL [sysbenchdb]> exit
Bye
[root@localhost ~]# obclient -h10.186.64.75 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench
Welcome to the OceanBase. Commands end with ; or \g.
Your MySQL connection id is 3221823448
Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [sysbenchdb]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 53195 |
+----------+
1 row in set (0.040 sec)
MySQL [sysbenchdb]> exit
Bye
[root@localhost ~]# obclient -h10.186.64.79 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench
Welcome to the OceanBase. Commands end with ; or \g.
Your MySQL connection id is 3222011907
Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [sysbenchdb]> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 53195 |
+----------+
1 row in set (0.037 sec)
MySQL [sysbenchdb]>
总结
数据文件损坏或者丢失时,可通过调整参数 server_permanent_offline_time 来重建受影响的节点。
1. 设小 server_permanent_offline_time 阈值。
7. 把 server_permanent_offline_time 参数改回原来的值。
关于SQLE
爱可生开源社区的 SQLE 是一款面向数据库使用者和管理者,支持多场景审核,支持标准化上线流程,原生支持 MySQL 审核且数据库类型可扩展的 SQL 审核工具。
SQLE 获取
| 类型 | 地址 |
|---|---|
| 版本库 | https://github.com/actiontech/sqle |
| 文档 | https://actiontech.github.io/sqle-docs-cn/ |
| 发布信息 | https://github.com/actiontech/sqle/releases |
| 数据审核插件开发文档 | https://actiontech.github.io/sqle-docs-cn/3.modules/3.7_auditplugin/auditplugin_development.html |





