相较于rhel6/centos6 下的HA配置,rhel7 下使用pcs 取代了原先的ccs和luci ,结合pcs+pacemaker+corosyn+oracle 可以搭建 Oracle 的HA配置,并迁移数据共享平台数据库。使用corosync作为集群消息事务层(Massage Layer),pacemaker作为集群资源管理器(Cluster Resource Management),pcs作为CRM的管理接口工具。
编辑host文件:
pcs01 : 10.228.110.126
pcs02 :10.228.110.128
VIP : 192.168.110.133
配置节点间互信,可使用如下脚本及命令配置
./sshUserSetup.sh -user root -hosts “hacluster01 hacluster02” -advanced –noPromptPassphrase
./sshUserSetup.sh -user root -hosts “hacluster01 hacluster02” -advanced –noPromptPassphrase
关闭防火墙
systemctl start firewalld
systemctl stop firewalld
systemctl disable firewalld
查看防火墙状态
systemctl status firewalld.service
关闭防火墙
systemctl stop firewalld.service
查看开机是否启动防火墙
systemctl is-enabled firewalld.service
开机不启动
systemctl disable firewalld.service
firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --add-service=high-availability
#firewall-cmd --reload
关闭selunux
vi /etc/selinux/config
selinux=disabled
setenforce 0
关闭NetworkManager
sysemctl stop NetworkManager
sysemctl disable NetworkManager
sysemctl status NetworkManager
挂载ISO
mount /home/patrol/rhel-server-7.4-x86_64-dvd.iso /mnt
或者 直接在IBMC管理口上加载ISO镜像
然后 mount /dev/sr0 /mnt
配置yum
[root@pcs02 yum.repos.d]# cat ha.repo
[rhel-yum]
name=ha2 ---这个名字要一样,
baseurl=file:///mnt/addons/HighAvailabilit-----ISO里双机软件包的位置
enabled=1
gpgcheck=0
[ha]
name=ha2
baseurl=file:///mnt
enabled=1
gpgcheck=0
软件安装
1 使用yum源可以直接安装corosync pacemaker以及pcs软件:
yum install pcs pacemaker fence-agents-all
--yum install /iso/addons/HighAvailability/* -y
yum install lvm2-cluster gfs2-utils
---我是直接把gfs2-utils这个rpm拷贝到系统里 rpm -ivh gfs2-utils.rpm这样安装的这个包, lvm2-cluster这个包没有找到。。。没安装 。
2 在安装linux系统时,选定其实在安装操作系统的时候在软件源那儿,选中“HIGH AVAILABILITY ADD-ON”这个选项,安装系统的时候就安装上双机软件了,这些就不用操作了。Hacluster用户也自动有了。
yum install pcs pacemaker fence-agents-all
更改hacluster用户密码
两台机器分别更改hacluster用户密码
passwd hacluster
两台机器密码改成一样的
开启服务
开启pcsd服务,两台都要开启
systemctl start pcsd
systemctl enable pcsd
systemctl start pcsd.service
systemctl enable pcsd.service
验证hacluster用户
pcs cluster auth node1 node2 ... nodeN --语法
[root@z1 ~]# pcs host auth z1.example.com z2.example.com
[root@pcs01 patrol]# pcs cluster auth pcs01 pcs02
Username: hacluster
Password:
pcs01: Authorized
pcs02: Authorized
[root@pcs01 patrol]#
[root@pcs02 Packages]# pcs cluster auth pcs01 pcs02
pcs01: Already authorized
pcs02: Already authorized
[root@pcs02 Packages]#
创建cluster
Create a cluster named mycluster with both nodes as cluster members. This command
creates and starts the cluster in one step. You only need to run this from one node in the cluster
because pcs configuration commands take affect for the entire cluster.
On one node in cluster, run the following command.
[root@z1 ~]# pcs cluster setup my_cluster --start z1.example.com z2.example.com
# pcs cluster setup mycluster --start pcs01 pcs02
软件配WEB登录
https://nodename:2224,或者 nodename换成ip
添加节点、cluster
添加两个节点,然后把两个节点做成一个 cluster
或者pcs cluster setup -name mycluster pcs01 pcs02
添加vip
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 cidr_netmask=24 op monitor
interval=30s
Resource id 资源名称 随意
IP 浮动IP地址
Nic 浮动ip 绑定到哪块网卡上,默认,或者指定
添加LVM资源配置
Volgrpname VG名字
添加Filesystem资源配置
Device 文件系统对应的设备 如:/dev/ftpvg/lv_backup
Directory mount点 如:/reportdata.backup
Fstype 文件系统类型 如: ext4
也可以同时配置资源之间的启停依赖顺序关系。
添加oracle资源配置
其中oracle资源主要包括tns和rdbms instance两种。
添加监听资源
按照如下步骤进行监听资源添加以及测试,图形添加:
type 部分选择oralsnr
resource group 选择分配的组。
resource_id 为资源的名称
sid为tnsnames.ora里面的别名
Home为数据库ORACLE_HOME
User为数据库软件安装用户
tns_admin为$ORACLE_HOME/network/admin
配置完成操作的案例:
数据库资源
按照如下步骤进行数据库资源添加以及测试,图形添加:
Resource_id为资源名称.
SID为数据库实例名称.
Home值为数据库ORACLE_HOME值
User为数据库软件安装用户
以下为实际操作部署的资源信息:
添加 fence设备
节点的IMM管理口 IP 、用户名、密码、
A Red Hat High Availability cluster requires that you configure fencing for the cluster. The
reasons for this requirement are described in Fencing in a Red Hat High Availability Cluster. For
this introduction, however, which is intended to show only how to use the basic Pacemaker
commands, disable fencing by setting the stonith-enabled cluster option to false.
WARNING
The use of stonith-enabled=false is completely inappropriate for a
production cluster. It tells the cluster to simply pretend that failed nodes are
safely powered off.
# pcs property set stonith-enabled=false
Port 节点名
软件管理
资源检查
pcs status
pcs status –full
pcs status resources –full
pcs status resources arch
pcs status resources dsdb
pcs resource show
pcs resource show vip
配置资源优先级
pcs constraint location ora prefers pcs01=150
pcs constraint location ora prefers pcs02=50
pcs constraint location arch prefers pcs01=150
pcs constraint location arch prefers pcs02=50
pcs constraint location data prefers pcs01=150
pcs constraint location data prefers pcs02=50
pcs constraint location VIP prefers pcs01=150
pcs constraint location VIP prefers pcs02=50
pcs constraint location LVM prefers pcs01=150
pcs constraint location LVM prefers pcs02=50
pcs constraint location lsnrctl prefers pcs01=150
pcs constraint location lsnrctl prefers pcs02=50
pcs constraint location database prefers pcs01=150
pcs constraint location database prefers pcs02=50
禁用与启用资源
pcs resource enable resource_id
pcs resource disable resource_id --关闭资源,并不再启动资源
资源组切换
pcs resource clear dsdb
pcs resource move dsdb
pcs resource move dsdb pcs01
pcs resource move dsdb pcs02
其他命令
1、pcs resource move 资源名/资源组名 --把资源/资源组 移动到另一个节点
2、执行 pcs resource clear
pcs resource cleanup WebSite资源名/资源组名 --清楚此资源在此节点上的 location限制,清楚限制之后,此资源/组,才可以在一次切换到这个节点,否则这个节点不能接收资源/组 了。
pcs status 查看cluster 整体状态
pcs status nodes ftp2 显示ftp2节点状态
pcs status groups ftpgroup 显示资源组状态
pcs config 显示所有配置信息
pcs resource show --full 显示所有资源信息
pcs resource show VIP 查看资源信息
fence_ipmilan -a 10.228.230.98 -l USERID -p PASSW0RD -o status 测试 fence 设备
pcs cluster stop [--all] [node] [...] 停止cluster
pcs cluster start 启动cluster
使 z1.example.com 進入 standby 模式中
# pcs cluster standby z1.example.com
z1 從 standby 模式中退出
# pcs cluster unstandby z1.example.com
集群模块管理
模块信息如下:
pcs cluster status
如何保障pacemaker corssync以及pcsd均属于enable状态?
systemctl enable corosync pacemaker pcsd
模拟故障
监听故障
模拟监听故障,手工停止监听程序,则cluster软件(资源参数intervals)过一段时间,会自动启动监听程序。
[oracle@pcs01 trace]$ lsnrctl stop listener_dsdb <--模拟监听故障
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 09-NOV-2018 17:53:44
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.228.110.133)(PORT=1521)))
The command completed successfully
[oracle@pcs01 trace]$ ps -ef|grep tns
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 4393 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$ ps -ef|grep tns
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 4468 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$ ps -ef|grep tns
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 4513 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$ ps -ef|grep tns
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 4516 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$ ps -ef|grep tns
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 4777 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$ ps -ef|grep tns ß 监听程序已经启动
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 5110 1 0 17:53 ? 00:00:00 /sjgxpt/oracle/product/11.2/db_1/bin/tnslsnr listener_dsdb -inherit
oracle 5233 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$ ps -ef|grep tns
root 183 2 0 11月06 ? 00:00:00 [netns]
oracle 5110 1 1 17:53 ? 00:00:00 /sjgxpt/oracle/product/11.2/db_1/bin/tnslsnr listener_dsdb -inherit
oracle 5376 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns
[oracle@pcs01 trace]$
数据库故障
ABORT方式模拟数据库宕库故障
[oracle@pcs01 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Mon Nov 12 11:43:05 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> shutdown abort
ORACLE instance shut down.
SQL> exit
数据库alert日志如下:
Mon Nov 12 11:44:05 2018
Shutting down instance (abort)
License high water mark = 33
USER (ospid: 8659): terminating the instance
Instance terminated by USER, pid = 8659
Mon Nov 12 11:44:07 2018
Instance shutdown complete
Mon Nov 12 11:46:03 2018
Adjusting the default value of parameter parallel_max_servers
from 1280 to 970 due to the value of parameter processes (1000)
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 13 GB
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION:
Total System Global Area size is 20 GB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 10241 (page size 2048 KB, total size 20 GB) system wide to
get 100% of the System Global Area allocated with large pages
2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 20 GB to lock
100% System Global Area's large pages into physical memory
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 32
Number of processor cores in the system is 16
Number of processor sockets in the system is 4
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =168
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NUMA system with 8 nodes detected
Oracle NUMA support not enabled
The parameter _enable_NUMA_support should be set to TRUE to enable Oracle NUMA support
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
ORACLE_HOME = /sjgxpt/oracle/product/11.2/db_1
System name: Linux
Node name: pcs01
Release: 3.10.0-693.el7.x86_64
Version: #1 SMP Thu Jul 6 19:56:57 EDT 2017
Machine: x86_64
Using parameter settings in server-side spfile /sjgxpt/oracle/product/11.2/db_1/dbs/spfiledsdb.ora
System parameters with non-default values:
processes = 1000
sessions = 1536
sga_target = 20G
control_files = "/sjgxpt/oracle/data/dsdb/control01.ctl"
control_files = "/sjgxpt/oracle/data/dsdb/control02.ctl"
db_block_size = 8192
compatible = "11.2.0.4.0"
log_archive_dest_1 = "LOCATION=/sjgxpt/oracle/arch"
log_archive_format = "%t_%s_%r.dbf"
undo_tablespace = "UNDOTBS1"
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=dsdbXDB)"
local_listener = "(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST =10.228.110.133)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)"
java_jit_enabled = TRUE
audit_file_dest = "/sjgxpt/oracle/product/admin/dsdb/adump"
audit_trail = "DB"
db_name = "dsdb"
open_cursors = 300
pga_aggregate_target = 12875M
diagnostic_dest = "/sjgxpt/oracle/product"
Mon Nov 12 11:46:03 2018
PMON started with pid=2, OS id=16298
Mon Nov 12 11:46:04 2018
PSP0 started with pid=3, OS id=16300
Mon Nov 12 11:46:05 2018
VKTM started with pid=4, OS id=16309 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Mon Nov 12 11:46:05 2018
GEN0 started with pid=5, OS id=16313
Mon Nov 12 11:46:05 2018
DIAG started with pid=6, OS id=16315
Mon Nov 12 11:46:05 2018
DBRM started with pid=7, OS id=16317
Mon Nov 12 11:46:05 2018
DIA0 started with pid=8, OS id=16319
Mon Nov 12 11:46:05 2018
MMAN started with pid=9, OS id=16321
Mon Nov 12 11:46:05 2018
DBW0 started with pid=10, OS id=16323
Mon Nov 12 11:46:05 2018
DBW1 started with pid=11, OS id=16325
Mon Nov 12 11:46:05 2018
DBW2 started with pid=12, OS id=16327
Mon Nov 12 11:46:05 2018
DBW3 started with pid=13, OS id=16329
Mon Nov 12 11:46:05 2018
LGWR started with pid=14, OS id=16331
Mon Nov 12 11:46:05 2018
CKPT started with pid=15, OS id=16333
Mon Nov 12 11:46:05 2018
SMON started with pid=16, OS id=16335
Mon Nov 12 11:46:05 2018
RECO started with pid=17, OS id=16337
Mon Nov 12 11:46:05 2018
MMON started with pid=18, OS id=16339
Mon Nov 12 11:46:05 2018
MMNL started with pid=19, OS id=16341
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /sjgxpt/oracle/product
Mon Nov 12 11:46:05 2018
ALTER DATABASE MOUNT
Successful mount of redo thread 1, with mount id 605261437
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
Mon Nov 12 11:46:09 2018
alter database open
Beginning crash recovery of 1 threads
parallel recovery started with 31 processes
Started redo scan
Completed redo scan
read 47 KB redo, 30 data blocks need recovery
Started redo application at
Thread 1: logseq 250, block 12222
Recovery of Online Redo Log: Thread 1 Group 1 Seq 250 Reading mem 0
Mem# 0: /sjgxpt/oracle/data/dsdb/redo01.log
Completed redo application of 0.02MB
Completed crash recovery at
Thread 1: logseq 250, block 12317, scn 4277695
30 data blocks read, 30 data blocks written, 47 redo k-bytes read
LGWR: STARTING ARCH PROCESSES
Mon Nov 12 11:46:10 2018
ARC0 started with pid=54, OS id=16641
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 251 (thread open)
Mon Nov 12 11:46:10 2018
ARC1 started with pid=55, OS id=16667
Mon Nov 12 11:46:10 2018
ARC2 started with pid=56, OS id=16683
Mon Nov 12 11:46:10 2018
ARC3 started with pid=57, OS id=16697
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 251
Current log# 2 seq# 251 mem# 0: /sjgxpt/oracle/data/dsdb/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 244 added for thread 1 sequence 250 ID 0x23e8dee0 dest 1:
[16498] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:501291664 end:501291704 diff:40 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Mon Nov 12 11:46:10 2018
QMNC started with pid=58, OS id=16794
Completed: alter database open
Starting background process CJQ0
Mon Nov 12 11:46:11 2018
CJQ0 started with pid=59, OS id=16892
主机故障
主机重启方式进行模拟:
[root@pcs01 ~]# reboot
Connection to 10.228.110.126 closed by remote host.
Connection to 10.228.110.126 closed.
软件资源维护
更新psu以及jvm补丁
如果需要对oracle数据库进行维护,包括停止数据库或者监听等需要采用维护模式,避免资源重启以及切换:
进入维护模式:pcs resource unmanage resource1
解除维护模式 pcs resource manage resource1
资源状态察看pcs状态
[root@pcs01 ~]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: pcs02 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu May 30 10:07:00 2019
Last change: Wed Apr 3 15:40:08 2019 by hacluster via crmd on pcs01
2 nodes configured
9 resources configured
Online: [ pcs01 pcs02 ]
Full list of resources:
ibmca (stonith:fence_ipmilan): Stopped
ibmcb (stonith:fence_ipmilan): Started pcs01
Resource Group: dsdb
VIP (ocf::heartbeat:IPaddr): Started pcs01
LVM (ocf::heartbeat:LVM): Started pcs01
orahome (ocf::heartbeat:Filesystem): Started pcs01
arch (ocf::heartbeat:Filesystem): Started pcs01
data (ocf::heartbeat:Filesystem): Started pcs01
database (ocf::heartbeat:oracle): Started pcs01
lsnrctl (ocf::heartbeat:oralsnr): Started pcs01
Failed Actions:
* ibmca_start_0 on pcs02 'unknown error' (1): call=239, status=Timed Out, exitreason='none',
last-rc-change='Wed Apr 3 14:42:27 2019', queued=0ms, exec=20177ms
* ibmcb_monitor_60000 on pcs01 'unknown error' (1): call=182, status=Timed Out, exitreason='none',
last-rc-change='Wed May 29 19:30:59 2019', queued=0ms, exec=20136ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@pcs01 ~]#
将节点置为维护模式:
[oracle@pcs01 20190415psu_11.2.0.4]$ lsnrctl stop listener_dsdb
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 30-MAY-2019 10:10:11
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.228.110.133)(PORT=1521)))
The command completed successfully
[oracle@pcs01 20190415psu_11.2.0.4]$
[oracle@pcs01 29141056]$ opatch apply
Jvm
Opatch apply
Startup upgrade
取消维护模式.
pcs resource clear dsdb
pcs resource move dsdb pcs01
pcs resource move dsdb pcs02




