Oracle RAC运维管理之节点删除和添加

DBA随笔记 2024-08-14

STEP1：删除实例

[grid@p19c01:/home/grid]$ olsnodes -s -t
p19c01  Active  Unpinned
p19c02  Active  Unpinned
[grid@p19c01:/home/grid]$ srvctl config database -d p19c0
Database unique name: p19c0
Database name: p19c0
Oracle home: u01/app/oracle/product/19.3.0/db
Oracle user: oracle
Spfile: +DATA/P19C0/PARAMETERFILE/spfile.267.1101425147
Password file: +DATA/P19C0/PASSWORD/pwdp19c0.256.1101422929
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: oper
Database instances: p19c01,p19c02
Configured nodes: p19c01,p19c02
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services:
Database is administrator managed


查看OCR备份
[root@p19c01:/root]$ ocrconfig -showbackup
手动备份OCR
[root@p19c01:/root]$ ocrconfig -manualbackup
p19c01     2022/04/09 21:42:52     +OCR:/p19c-cluster/OCRBACKUP/backup_20220409_214252.ocr.267.1101591773     3331580692
p19c02     2022/04/08 17:14:23     +OCR:/p19c-cluster/OCRBACKUP/backup_20220408_171423.ocr.263.1101489263     3331580692


1.1 停止实例（在任意一个节点上）
[root@p19c01:/root]$ srvctl stop instance -d p19c0 -n p19c02


1.2 oracle用户在保留节点使用dbca的静默模式进行删除实例,删除节点DB instance
dbca -silent -deleteInstance -nodeList p19c02 -gdbName p19c0 -instanceName p19c02 -sysDBAUserName sys -sysDBAPassword oracle


[root@p19c01:/root]$ su - oracle
Last login: Sat Apr  9 21:49:45 CST 2022
[oracle@p19c01:/home/oracle]$ dbca -silent -deleteInstance -nodeList p19c02 -gdbName p19c0 -instanceName p19c02 -sysDBAUserName sys -sysDBAPassword oracle
[WARNING] [DBT-19203] The Database Configuration Assistant will delete the Oracle instance and its associated OFA directory structure. All information about this instance will be deleted.


Prepare for db operation
40% complete
Deleting instance
48% complete
52% complete
56% complete
60% complete
64% complete
68% complete
72% complete
76% complete
80% complete
Completing instance management.
100% complete
Instance "p19c02" deleted successfully from node "p19c02".
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/p19c0/p19c01.log" for further details.




[oracle@p19c01:/home/oracle]$ srvctl config database -d p19c0
Database unique name: p19c0
Database name: p19c0
Oracle home: u01/app/oracle/product/19.3.0/db
Oracle user: oracle
Spfile: +DATA/P19C0/PARAMETERFILE/spfile.267.1101425147
Password file: +DATA/P19C0/PASSWORD/pwdp19c0.256.1101422929
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: oper
Database instances: p19c01
Configured nodes: p19c01
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services:
Database is administrator managed

STEP2：删除数据库软件

2.1 禁用和停止被删除节点的监听
[grid]$ srvctl disable listener -listener LISTENER -node p19c02
[grid]$ srvctl stop listener -listener LISTENER -node p19c02


2.2 更新inventory（在被删除的节点上运行）
[oracle@p19c02 bin]$ cd $ORACLE_HOME/oui/bin
[oracle@p19c02 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=p19c02" -local
p19c02是要被删除的节点


2.3 卸载ORACLE HOME（在被删除的节点上运行），就是删除ORACLE DATABASE软件
[oracle@p19c02 db_home]$ $ORACLE_HOME/deinstall/deinstall -local




2.4 更新inventory（在被保留的节点上运行）
[oracle@rac1 bin]$ cd $ORACLE_HOME/oui/bin
[oracle@rac1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=p19c01" -local
节点名是要被保留的节点列表

step3:从clusterware中删除节点

在要被删除的节点执行下面的步骤
以grid用户登录操作系统


3.1 查看节点状态
[grid@rac2 bin]$ olsnodes -s -t
rac1    Active  Unpinned
rac2    Inactive        Unpinned


如果节点是被pin住的，则需要执行下面的命令进行解pin
[root@rac2 ~]#
crsctl unpin css -n p19c02




3.2 移除RAC grid home，在删除的节点上执行
[grid@p19c02:/u01/app/19.3.0/grid/deinstall]$ cd $ORACLE_HOME/deinstall
[grid@p19c02:/u01/app/19.3.0/grid/deinstall]$ ./deinstall -local
会提示以root用户运行rootcrs.sh脚本
[root@p19c02:/]$
/u01/app/19.3.0/grid/crs/install/rootcrs.sh -force  -deconfig -paramfile "/tmp/deinstall2022-04-09_11-10-16PM/response/deinstall_OraGI19Home1.rsp"


3.4 在保留的节点上，执行下面命令，更新inventory
cd $GRID_HOME/oui/bin
./runInstaller -updateNodeList ORACLE_HOME=$GRID_HOME "CLUSTER_NODES=p19c01" CRS=TRUE -silent
p19c01是留下的节点


例：
在所有保留节点上以grid用户 更新保留节点的Inventory
[grid@p19c01:/home/grid]$ cd $ORACLE_HOME/oui/bin
[grid@p19c01:/home/grid]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/19.3.0/grid "CLUSTER_NODES={p19c01}" CRS=TRUE -silent -local




3.5 此时会保留目录/u01/app/19.3.0和/u01/app/grid
    在保留节点的其中一个节点上运行以下命令删除群集节点：
[root@p19c01 ~]# cd u01/app/19.3.0/grid/bin/
[root@p19c01 bin]# ./crsctl delete node -n p19c02
CRS-4661: Node p19c02 successfully deleted.


[root@p19c01 bin]# ./olsnodes -s -t
p19c01  Active  Unpinned


3.6  运行以下CVU命令以验证指定节点是否已成功从群集中删除：
$ cluvfy stage -post nodedel -n node_list [-verbose]


[grid@p19c01 bin]$
cluvfy stage -post nodedel -n p19c02 -verbose
olsnodes -s -t




[grid@p19c01:/u01/app/19.3.0/grid/oui/bin]$ cluvfy stage -post nodedel -n p19c02 -verbose
This software is "360" days old. It is a best practice to update the CRS home by downloading and applying the latest release update. Refer to MOS note 2731675.1 for more details.


Verifying Node Removal ...
  Verifying CRS Integrity ...PASSED
  Verifying Clusterware Version Consistency ...PASSED
Verifying Node Removal ...PASSED


Post-check for node removal was successful.


CVU operation performed:      stage -post nodedel
Date:                         Apr 9, 2022 11:39:15 PM
CVU home:                     u01/app/19.3.0/grid/
User:                         grid


[grid@p19c01:/u01/app/19.3.0/grid/oui/bin]$ olsnodes -s -t
p19c01  Active  Unpinned

step4: 添加节点

4.1 环境准备

节点二重装操作系统，配置Oracle环境，配置共享存储

4.2 配置SSH互信

对 grid 和 oracle 用户配置 SSH互信

cd $ORACLE_HOME/oui/prov/resources/scripts
[grid@p19c01:/u01/app/19.3.0/grid/oui/prov/resources/scripts]$ 
 ./sshUserSetup.sh -user grid -hosts "p19c01 p19c02" -advanced -noPromptPassphrase
 ./sshUserSetup.sh -user oracle -hosts "p19c01 p19c02" -advanced -noPromptPassphrase

4.3 使用CVU验证添加的节点是否满足要求

在现有集群节点的grid用户下执行以下命令验证添加的节点是否满足GI软件的要求(对新节点做安装前的检查)

[grid@p19c01 .ssh]$ cluvfy comp peer -refnode p19c01 -n p19c02 -verbose
[grid@p19c01 .ssh]$ cluvfy stage -pre nodeadd -n p19c02 -verbose -fixup

4.4 添加Clusterware

执行以下命令将添加新节点Clusterware软件 (在现有集群节点的grid用户执行)

>>在节点rac1上安装GI
[grid@rac1 ~]$ cd u01/app/19.3.0/grid/addnode/


如果没有配置dns，可以这样忽略dns检查
[grid@rac1 ~]export IGNORE_PREADDNODE_CHECKS=Y
[grid@rac1 ~]$ ./addnode.sh -silent -ignorePrereq "CLUSTER_NEW_NODES={p19c02}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={p19c02-vip}" "CLUSTER_NEW_NODE_ROLES={hub}"


提示：
Update Inventory in progress.
You can find the log of this install session at:
 u01/app/oraInventory/logs/addNodeActions2022-04-10_11-27-03AM.log


Update Inventory successful.
..................................................   97% Done.


As a root user, execute the following script(s):
        1. u01/app/oraInventory/orainstRoot.sh
        2. u01/app/19.3.0/grid/root.sh


Execute u01/app/oraInventory/orainstRoot.sh on the following nodes:
[p19c02]
Execute u01/app/19.3.0/grid/root.sh on the following nodes:
[p19c02]


The scripts can be executed in parallel on all the nodes.


Successfully Setup Software with warning(s).
..................................................   100% Done.

上一步执行成功之后，在新节点以root用户身份运行以下两个脚本

# /u01/app/oraInventory/orainstRoot.sh
# /u01/app/19.0.0/grid/root.sh
运行root.sh 时，见到“'UpdateNodeList' was successful.”才表示脚本运行成功。
root.sh脚本会启动相关的服务


[root@rac2 ~]# /u01/app/oraInventory/orainstRoot.sh
[root@rac2 ~]# /u01/app/19.3.0/grid/root.sh

重复运行root.sh会产生错误，所以，在每次运行前，最好卸载残留的安装
[root@rac2 ~]# /u01/app/19.0.0/grid/crs/install/rootcrs.pl -deconfig -force

4.5 验证

[grid@rac1 ~]$ crsctl status res -t
[grid@rac1 ~]$ crsctl status res -t -init
[grid@rac1 ~]$ crsctl check cluster -all
[grid@rac1 ~]$ olsnodes -n
[grid@rac1 ~]$ srvctl status asm
[grid@rac1 ~]$ srvctl status listener

4.6 新节点安装ORACLE DATABASE软件

为新节点添加Database软件 (在现有集群节点以oracle用户执行)

[oracle]$ cd /u01/app/oracle/product/19.3.0/db/addnode/
[oracle@p19c01:/u01/app/oracle/product/19.3.0/db/addnode]$
./addnode.sh -silent -ignorePrereq "CLUSTER_NEW_NODES={p19c02}"

上一步完成之后，在新的节点以root用户身份运行以下脚本

提示
Setup Oracle Base successful.
..................................................   96% Done.


As a root user, execute the following script(s):
        1. /u01/app/oracle/product/19.3.0/db/root.sh


Execute /u01/app/oracle/product/19.3.0/db/root.sh on the following nodes:
[p19c02]


在新节点rac2上，运行root脚本
[root@rac1 ~]# /u01/app/oracle/product/19.3.0/db/root.sh

在现有集群节点或新节点，在grid和oracle用户下执行以下命令验证Clusterware和Database软件是否添加正确

[grid]$ cluvfy stage -post nodeadd -n p19c02 -verbose

4.7 添加DB instance

登录rac1，做如下查询
SQL> select instance_name from Gv$instance;


INSTANCE_NAME
----------------
p19c01
可以看到整个集群中，只有1个实例

方案1:

使用dbca工具执行以下命令，以静默模式添加新节点数据库实例（在现有集群节点以oracle用户执行）

[oracle@p19c01 ~]$ dbca -silent -addInstance -gdbName "p19c0" -nodeName "p19c02" -instanceName "p19c02" -sysDBAUserName "sys" -sysDBAPassword "oracle"

方案2：

在现有节点以 oracle 用户运行 dbca

oracle RAC database instance management–>add an instence

4.8 检查集群和数据库是否正常

SQL> select instance_number,instance_name,status from gv$instance;
SQL> select thread#,status,instance from gv$thread;
[grid@rac1 ~]$ crsctl status res -t

文章转载自DBA随笔记，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。