Oracle Autonomous Health Framework(AHF) 包含 Oracle ORAchk, Oracle EXAchk, and Oracle Trace File Analyzer(TFA).
AHF的下载地址:
Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk (Doc ID 2550798.1)
以root用户安装,可以获得完整功能,非root用户安装,其功能会减少。
如果已安装 AHF,则重新安装会在现有位置进行升级,但是根据我的测试来看,建议先卸载现有安装再安装新版本,不然执行命令是各种报错。
Oracle Clusterware 不管理 AHF,因为如果 Oracle Clusterware 停止工作,AHF 必须可用。
安装
安装命令提供了以下选项:
[root@rac1 ~]# ./ahf_setup --help
-ahf_loc - Install into the directory supplied. (Default /opt/oracle.ahf) # 安装到指定位置
-data_dir - AHF Data Directory where all the collections, metadata, etc. will be stored # 指定数据目录
-nodes - Comma separated Remote Node List # 逗号分隔的集群中的其他节点列表
-extract - Extract only files from Installer. (Default for non-root users) # 提取安装,比如只安装ORAchk
-nosymlink - Do not create symlinks on Exadata DOM0. (Default for non-root users)
-notfasetup - Do not Configure TFA when used with -extract # 不配置TFA
-local - Only install on the local node # 仅在本地节点安装
-silent - Do not ask any install questions # 静默安装
-tmp_loc - Temporary location directory for AHF to extract the install archive to (must exist) (Default /tmp)
-perlhome - Custom location of perl binaries
-force - Force AHF Install # 强制安装
-debug - Debug AHF Install Script
-level - AHF Instal Debug Level 1-6 (Default 4 with option -debug)
如果只想安装 Oracle ORAchk 或 Oracle EXAchk ,不想运行 Oracle Trace File Analyzer,则使用 -extract -notfasetup 参数.
[root@rac1 ~]# ./ahf_setup -extract -notfasetup
以下列出安装步骤:
[root@rac1 opt]# ./ahf_setup
AHF Installer for Platform Linux Architecture x86_64
AHF Installation Log : /tmp/ahf_install_213000_13685_2021_12_10-16_17_25.log
Starting Autonomous Health Framework (AHF) Installation
AHF Version: 21.3.0 Build Date: 202110290347
Default AHF Location : /opt/oracle.ahf
Do you want to install AHF at [/opt/oracle.ahf] ? [Y]|N : # 默认安装位置是 /opt/oracle.ahf
AHF Location : /opt/oracle.ahf
AHF Data Directory stores diagnostic collections and metadata.
AHF Data Directory requires at least 5GB (Recommended 10GB) of free space.
Choose Data Directory from below options :
1. /u01/app/grid [Free Space : 50585 MB]
2. Enter a different Location
Choose Option [1 - 2] : 1 # 选择数据存放目录
AHF Data Directory : /u01/app/grid/oracle.ahf/data
Do you want to add AHF Notification Email IDs ? [Y]|N : N # 输入N回车继续
AHF will also be installed/upgraded on these Cluster Nodes :
1. rac2
The AHF Location and AHF Data Directory must exist on the above nodes
AHF Location : /opt/oracle.ahf
AHF Data Directory : /u01/app/grid/oracle.ahf/data
Do you want to install/upgrade AHF on Cluster Nodes ? [Y]|N : # 回车继续
Extracting AHF to /opt/oracle.ahf
Configuring TFA Services
Discovering Nodes and Oracle Resources
Not generating certificates as GI discovered
Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
.-------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID |
+------+---------------+-------+------+------------+----------------------+
| rac1 | RUNNING | 15284 | 5000 | 21.3.0.0.0 | 21300020211029034711 |
'------+---------------+-------+------+------------+----------------------'
Running TFA Inventory...
Adding default users to TFA Access list...
.------------------------------------------------------------.
| Summary of AHF Configuration |
+-----------------+------------------------------------------+
| Parameter | Value |
+-----------------+------------------------------------------+
| AHF Location | /opt/oracle.ahf |
| TFA Location | /opt/oracle.ahf/tfa |
| Orachk Location | /opt/oracle.ahf/orachk |
| Data Directory | /u01/app/grid/oracle.ahf/data |
| Repository | /u01/app/grid/oracle.ahf/data/repository |
| Diag Directory | /u01/app/grid/oracle.ahf/data/rac1/diag |
'-----------------+------------------------------------------'
Starting orachk scheduler from AHF ...
AHF install completed on rac1
Installing AHF on Remote Nodes :
AHF will be installed on rac2, Please wait.
AHF will prompt twice to install/upgrade per Remote Node. So total 2 prompts
Do you want to continue Y|[N] : Y # 在其他节点安装
AHF will continue with Installing on remote nodes
Installing AHF on rac2 :
[rac2] Copying AHF Installer
root@rac2's password: # 输入远程节点密码
[rac2] Running AHF Installer
root@rac2's password: # 输入远程节点密码
AHF binaries are available in /opt/oracle.ahf/bin
AHF is successfully installed
Do you want AHF to store your My Oracle Support Credentials for Automatic Upload ? Y|[N] : # 回车继续
Moving /tmp/ahf_install_213000_13685_2021_12_10-16_17_25.log to /u01/app/grid/oracle.ahf/data/rac1/diag/ahf/
出现以下情况,需要每个节点单独安装
Login using root is disabled in sshd config. Installing AHF only on Local Node
Oracle Autonomous Health Framework 安装时默认配置了服务,且开机自启。
升级
部署RAC集群或安装RU时一般会自带,所以下载新版本安装时都会提示升级,但是个人观点,不推荐直接升级,建议先卸载再安装。
以下列出升级步骤:
[root@rac1 ~]# umask
0022 # 22, 022, or 0022
[root@rac1 ~]# unzip AHF-LINUX_v21.3.0.zip
[root@rac1 ~]# ./ahf_setup
AHF Installer for Platform Linux Architecture x86_64
AHF Installation Log : /tmp/ahf_install_213000_24863_2021_11_26-19_46_09.log
Starting Autonomous Health Framework (AHF) Installation
AHF Version: 21.3.0 Build Date: 202110290347
AHF is already installed at /opt/oracle.ahf # 发现当前系统已经安装
Installed AHF Version: 21.1.2 Build Date: 202105140934
Do you want to upgrade AHF [Y]|N : Y # 询问是否要升级
AHF will also be installed/upgraded on these Cluster Nodes : # 同时升级其他节点
1. rac2
The AHF Location and AHF Data Directory must exist on the above nodes # 确保升级的所有节点存在以下目录
AHF Location : /opt/oracle.ahf
AHF Data Directory : /u01/app/grid/oracle.ahf/data
Do you want to install/upgrade AHF on Cluster Nodes ? [Y]|N : Y # 询问是否要在集群节点上安装或升级 AHF
Upgrading /opt/oracle.ahf # 对于已安装的进行升级
Shutting down AHF Services
TFA-00201 Diagnostic directory not found.
Shutting down TFA
Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
Successfully shutdown TFA..
Starting AHF Services
Starting TFA..
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Waiting up to 100 seconds for TFA to be started..
. . . . .
Successfully started TFA Process..
. . . . .
TFA Started and listening for commands
TFA-00201 Diagnostic directory not found.
AHF upgrade completed on rac1 # 本地节点安装完成
Upgrading AHF on Remote Nodes :
AHF will be installed on rac2, Please wait.
AHF will prompt twice to install/upgrade per Remote Node. So total 2 prompts
Do you want to continue Y|[N] : Y # 提示是否继续升级远程节点
AHF will continue with Upgrading on remote nodes
Upgrading AHF on rac2 :
[rac2] Copying AHF Installer
root@rac2's password: # 需要输入远程节点的root密码
[rac2] Running AHF Installer
root@rac2's password:
Do you want AHF to store your My Oracle Support Credentials for Automatic Upload ? Y|[N] :
AHF is successfully upgraded to latest version
.------------------------------------------------------------.
| Host | TFA Version | TFA Build ID | Upgrade Status |
+------+-------------+----------------------+----------------+
| rac1 | 21.3.0.0.0 | 21300020211029034711 | UPGRADED |
| rac2 | 21.3.0.0.0 | 21300020211029034711 | UPGRADED |
'------+-------------+----------------------+----------------'
Moving /tmp/ahf_install_213000_24863_2021_11_26-19_46_09.log to /u01/app/grid/oracle.ahf/data/rac1/diag/ahf/
[root@rac1 ~]#
我升级完以后执行命令会出现以下报错,不知道是不是偶然:
[root@rac1 ~]# tfactl toolstatus
TFA-00201 Diagnostic directory not found.
[oracle@rac1 ~]$ tfactl diagcollect -srdc DBASM
TFA-00201 Diagnostic directory not found.
MOS Tfactl Failed With TFA-00201 (Doc ID 2659786.1) 建议卸载重装。
卸载
[root@rac1 ~]# tfactl uninstall
Starting AHF Uninstall
NOTE : Uninstalling will delete the repository as well since Install type is GI
AHF will be uninstalled on:
rac1
rac2
# 提示是否继续卸载
Do you want to continue with AHF uninstall ? [Y]|N : Y
... ...
Stopping and removing AHF in rac2...
# 远程到其他节点上执行卸载,需要输入其他节点的密码
root@rac2's password:
命令
- TFA的启停,安装TFA时默认配置了服务
[root@rac1 ~]# systemctl stop oracle-tfa.service
[root@rac1 ~]# systemctl start oracle-tfa.service
- 查看 TFA 的运行状态
[root@rac1 ~]# tfactl print status
.-------------------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |
+------+---------------+------+------+------------+----------------------+------------------+
| rac1 | RUNNING | 2098 | 5000 | 21.3.0.0.0 | 21300020211029034711 | COMPLETE |
| rac2 | RUNNING | 2300 | 5000 | 21.3.0.0.0 | 21300020211029034711 | COMPLETE |
'------+---------------+------+------+------------+----------------------+------------------'
- 查看 TFA 工具的状态
[root@rac1 ~]# tfactl toolstatus
Running command tfactltoolstatus on rac2 ...
.------------------------------------------------------------------.
| TOOLS STATUS - HOST : rac2 |
+----------------------+--------------+--------------+-------------+
| Tool Type | Tool | Version | Status |
+----------------------+--------------+--------------+-------------+
| AHF Utilities | alertsummary | 21.3.0 | DEPLOYED |
| | calog | 21.3.0 | DEPLOYED |
| | dbglevel | 21.3.0 | DEPLOYED |
| | grep | 21.3.0 | DEPLOYED |
| | history | 21.3.0 | DEPLOYED |
| | ls | 21.3.0 | DEPLOYED |
| | managelogs | 21.3.0 | DEPLOYED |
| | menu | 21.3.0 | DEPLOYED |
| | orachk | 21.3.0 | DEPLOYED |
| | param | 21.3.0 | DEPLOYED |
| | ps | 21.3.0 | DEPLOYED |
| | pstack | 21.3.0 | DEPLOYED |
| | summary | 21.3.0 | DEPLOYED |
| | tail | 21.3.0 | DEPLOYED |
| | triage | 21.3.0 | DEPLOYED |
| | vi | 21.3.0 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Development Tools | oratop | 14.1.2 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED |
| | oswbb | 8.3.2 | RUNNING |
| | prw | 12.1.13.11.4 | NOT RUNNING |
'----------------------+--------------+--------------+-------------'
Note :-
DEPLOYED : Installed and Available - To be configured or run interactively.
NOT RUNNING : Configured and Available - Currently turned off interactively.
RUNNING : Configured and Available.
.------------------------------------------------------------------.
| TOOLS STATUS - HOST : rac1 |
+----------------------+--------------+--------------+-------------+
| Tool Type | Tool | Version | Status |
+----------------------+--------------+--------------+-------------+
| AHF Utilities | alertsummary | 21.3.0 | DEPLOYED |
| | calog | 21.3.0 | DEPLOYED |
| | dbglevel | 21.3.0 | DEPLOYED |
| | grep | 21.3.0 | DEPLOYED |
| | history | 21.3.0 | DEPLOYED |
| | ls | 21.3.0 | DEPLOYED |
| | managelogs | 21.3.0 | DEPLOYED |
| | menu | 21.3.0 | DEPLOYED |
| | orachk | 21.3.0 | DEPLOYED |
| | param | 21.3.0 | DEPLOYED |
| | ps | 21.3.0 | DEPLOYED |
| | pstack | 21.3.0 | DEPLOYED |
| | summary | 21.3.0 | DEPLOYED |
| | tail | 21.3.0 | DEPLOYED |
| | triage | 21.3.0 | DEPLOYED |
| | vi | 21.3.0 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Development Tools | oratop | 14.1.2 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED |
| | oswbb | 8.3.2 | RUNNING |
| | prw | 12.1.13.11.4 | NOT RUNNING |
'----------------------+--------------+--------------+-------------'
Note :-
DEPLOYED : Installed and Available - To be configured or run interactively.
NOT RUNNING : Configured and Available - Currently turned off interactively.
RUNNING : Configured and Available.
OSWatcher
不知道从 11G RAC 的哪个 PSU 开始,OSWatcher 就集成 TFA 上了。
TFA 中的 OSWatcher 默认采集间隔是30s,保存时间为48h:
[root@rac1 ~]# ps -ef |grep OSW
grid 27395 1 0 16:32 ? 00:00:00 /bin/sh ./OSWatcher.sh 30 48 /bin/gzip /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
grid 27537 27395 0 16:32 ? 00:00:00 /bin/sh ./OSWatcherFM.sh 48 /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
TFA 停了,OSWatcher 不会停
[root@rac1 ~]# systemctl stop oracle-tfa.service
[root@rac1 ~]# ps -ef |grep OSW
root 12964 17294 0 16:59 pts/0 00:00:00 grep --color=auto OSW
grid 27395 1 0 16:32 ? 00:00:00 /bin/sh ./OSWatcher.sh 30 48 /bin/gzip /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
grid 27537 27395 0 16:32 ? 00:00:00 /bin/sh ./OSWatcherFM.sh 48 /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
[root@rac1 ~]# tfactl toolstatus
TFA-00104 Cannot establish connection with TFA Server. Please check TFA Certificates
测试发现 TFA 中的 OSWatcher 跟集群中的数据库似得,手动停了OSWatcher,重启操作系统不会自动启动 OSWatcher,不停 OSWatcher 重启操作系统会自动启动 OSWatcher。
[root@rac1 ~]# tfactl stop oswbb
[root@rac1 ~]# reboot
[root@rac1 ~]# ps -ef |grep OSW
root 8717 4461 0 17:16 pts/0 00:00:00 grep --color=auto OSW
修改 OSWatcher 的采集间隔和保存时间,采集间隔修改为15s,保存时间修改为168h(也就是7天):
[root@rac1 ~]# tfactl stop oswbb
[root@rac1 ~]# tfactl start oswbb 15 168
# 压缩
[root@rac1 ~]# tfactl start oswbb 15 168 gzip
[root@rac1 ~]# ps -ef |grep osw
grid 31224 1 0 17:26 pts/0 00:00:00 /bin/sh ./OSWatcher.sh 15 168 /bin/gzip /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
grid 31358 31224 0 17:27 pts/0 00:00:00 /bin/sh ./OSWatcherFM.sh 168 /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
# 重启操作系统修改的配置会保留
# 手动重启 OSWatcher 配置也是会保留的
[root@rac1 ~]# tfactl stop oswbb
Stopped OSWatcher
[root@rac1 ~]# tfactl start oswbb
Starting OSWatcher
[root@rac1 ~]# ps -ef |grep osw
grid 13055 1 0 17:36 pts/0 00:00:00 /bin/sh ./OSWatcher.sh 15 168 /bin/gzip /u01/app/grid/oracle.ahf/data/repository/suptools/rac1/oswbb/grid/archive
收集诊断日志
通过调用命令 tfactl diagcollect 来控制TFA收集我们期望的诊断日志。tfactl 提供了多种可选择的模式进行收集,如收集一个时间段内的日志来减少收集日志的量;
具体操作的命令可以通过以下方式看到:
[root@rac1 ~]# tfactl diagcollect -h
Collect logs from across nodes in cluster
Usage : /opt/oracle.ahf/tfa/bin/tfactl diagcollect [ [component_name1] [component_name2] ... [component_nameN] | [-srdc <srdc_profile>] | [-defips]] [-sr <SR#>] [-node <all|local|n1,n2,..>] [-tag <tagname>] [-z <filename>] [-acrlevel <system,database,userdata>] [-last <n><m|h|d>| -from <time> -to <time> | -for <time>] [-nocopy] [-notrim] [-silent] [-cores][-collectalldirs][-collectdir <dir1,dir2..>][-collectfiles <file1,..,fileN,dir1,..,dirN> [-onlycollectfiles]][-examples]
components:-ips|-database|-asm|-crsclient|-dbclient|-dbwlm|-tns|-rhp|-procinfo|-cvu|-afd|-crs|-cha|-wls|-emagenti|-emagent|-oms|-omsi|-ocm|-emplugins|-em|-acfs|-install|-cfgtools|-os|-ashhtml|-ashtext|-awrhtml|-awrtext|-sosreport|-qos|-ahf|-dataguard
-srdc Service Request Data Collection (SRDC).
-database Specify comma separated list of db unique names for collection
-defips Include in the default collection the IPS Packages for:
ASM, CRS and Databases
-sr Enter SR number to which the collection will be uploaded
-node Specify comma separated list of host names for collection
-tag <tagname> The files will be collected into tagname directory inside
repository
-z <zipname> The collection zip file will be given this name within the
TFA collection repository
-last <n><m|h|d> Files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours
-since Same as -last. Kept for backward compatibility.
-from "Mon/dd/yyyy hh:mm:ss" From <time>
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-to "Mon/dd/yyyy hh:mm:ss" To <time>
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-for "Mon/dd/yyyy" For <date>.
or "yyyy-mm-dd"
-nocopy Does not copy back the zip files to initiating node from all nodes
-notrim Does not trim the files collected
-silent This option is used to submit the diagcollection as a background
process
-cores Collect Core files when it would normally not have been
collected
-collectalldirs Collect all files from a directory marked "Collect All"
flag to true
-collectdir Specify a comma separated list of directories and the collection will
include all files from these irrespective of type and time constraints
in addition to the components specified
-collectfiles Specify a comma separated list of files/directories and the collection will
include the files and directories in addition to the components specified.
if -onlycollectfiles is also used, then no other components will be collected.
-acrlevel <system,database,userdata> This option is used to specify the ACR level(s) for redaction
-examples Show diagcollect usage examples
For detailed help on each component use:
/opt/oracle.ahf/tfa/bin/tfactl diagcollect [component_name1] [component_name2] ... [component_nameN] -help
DBA 在解决GI(RAC)问题的时候,一个最大的问题就是及时的收集各个节点上和问题相关的诊断日志,特别是收集的日志还有跨节点。以前使用 diagcollection.pl 脚本来收集日志,但是这个脚本的弊端是不会甄别日志里的内容,会把所有的 RAC 日志从头至尾都收集一遍,所以这个脚本收集的日志一般都是非常大的,而且这个脚本必须要在各个节点上分别使用root用户分别运行,使用不便利。
TFA基本上克服了上边的这些问题,TFA通过在每一个节点上运行一个Java的虚拟环境,来判断什么时候需要启动来收集,压缩日志,并且来判断哪些日志是解决问题必要,TFA是运行在 GI 和 RDBMS 之外的产品,所以它甚至和当前使用的版本和平台都没有关系。
所以,在处理 Oracle GI 和 RAC 问题时,使用 TFA 可以一键收集所有需要的日志,而且会过滤掉不需要的日志。
也有客户担心使用 TFA 会对系统有影响,了解了上述它的功能之后,您就可以知道它只是一个日志收集工具,并不会对系统产生变更,他对 OS 的负载压力是轻量级的。
[root@rac1 ~]# tfactl diagcollect -examples
Examples:
/opt/oracle.ahf/tfa/bin/tfactl diagcollect
# 收集并压缩最近 1 小时内所有集群节点更新的所有日志文件以及 chmos/osw 数据。
# 注意:收集的日志量可能会比较大,但作为收集最近发生问题的诊断日志,这是最简单的方法。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -last 8h
# 收集并压缩最近 8 小时内所有集群节点更新的所有日志文件以及 chmos/osw 数据。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -database hrdb,fdb -last 1d -z foo
# 收集并压缩最近 1 天内所有集群节点 hrdb 和 fdb 数据库的诊断日志。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -crs -os -node node1,node2 -last 6h
# 收集并压缩最近 6 小时内 node1 和 node2 节点的 crs 日志、os 日志和 chmos/osw 数据。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -asm -node node1 -from "Dec/09/2021" -to "Dec/09/2021 21:00:00"
# 收集并压缩 node1 节点在 from 和 to 时间段的 ASM 日志。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -for "Dec/09/2021"
# 收集并压缩 for 指定的那一天的所有日志文件以及 chmos/osw 数据。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -for "Dec/09/2021 21:00:00"
# 收集并压缩 for 指定的时间的前后12小时之间的所有日志文件以及 chmos/osw 数据。
/opt/oracle.ahf/tfa/bin/tfactl diagcollect -crs -collectdir /tmp_dir1,/tmp_dir2
# 收集并压缩最近 1 小时内更新的所有 crs 文件,还在发起节点收集来自 /tmp_dir1 和 /tmp_dir2 的所有文件。
tfactl diagcollect -from "2022-01-14 08:00:00" -to "2022-01-14 19:00:00"




