2021年4月15日,云和恩墨服务总监,Oracle ACE-A 张维照进行了为大家带来了《Oracle诊断性能问题时常用脚本工具》的精彩技术直播。
本次活动观众反响强烈,并且在直播间与讲师积极互动问答。小编为大家汇总了7个代表性的问题,请张维照老师为大家进行详细作答,同时分享本次的直播回放与课件,脚本资源,供大家“温故而知新”。
(视频回放限时免费播放,请大家及时观看学习)
问答汇总
Q1. 老师,请问,如何能够肯定的告诉人家数据库是OK的,数据库一定没问题,因为在问题未明确之前,所有关节就都是怀疑对象。
A1.确认应用端问题、时间、现象、影响范围、错误信息、主机、program、db user等信息
连接问题:
a,检查监听及服务正常,新连接正常,无白名单拒绝
b,网络traceroute正常
c, 用户状态
d, 归档空间
e, 最大连接数
f, db alert log
性能问题:
a, 监听连接
b, 确认对应session信息,交互的用户状态基本三种:idle、proceeding、waiting
c,确认session 是no-idle状态时的wait_event、waited_time、blocking
d,当前session运行sql的状态
e, db instance 负载、hang?
…
如果监听连接正常,数据库负载正常,应用会话与数据库无交互或当前的应用sql效率正常,执行量增加等等,基本上就可以排除数据库问题了。
如果把所有检查项做成一键检查或工具流程化,提供一系列数据检查项,那就可以明确回答“数据库正常!”。
Q2. 老师您好!请问前台业务系统某个功能模块,用户反馈点击很慢,应该如何去分析呢
A2.这类问题涉及客户端主机、网络、web server、操作系统、应用逻辑、DB等多个环节,建议先明确应用客户端对应的db上的session信息。
1、如果事件是进行时,根据v$session(ACTIVE)信息,去定位wait event, top sql等时间消耗。
2、如果数据库主机当前负载(CPU、IO、网络吞吐、memory)较高甚至枯竭,就需要先解决实例级(DB)问题。
Q3. 请问怎么加快imp的速度
A3.首先imp在目前主流的版本已经光荣退役了,有datapump替代数据的导出导入。
但是目前也不排除有些情况需要使用exp、imp。
比如:记录数少生成文件到客户端(非DB服务端),因为imp不像impdp和sqlldr一样有direct path加载方式 ,通常imp加速的一些手段有如下几个:
a.指定buffer, 如 commit=N buffer =512000
b.增加RECORDLENGTH, 如RECORDLENGTH=65535
c.延后维护索引,如imp index=n 或先删除表上索引,imp后手动并行创建
d.手动拆分imp任务,实现并行多任务导入
Q4. 老师,我想问一下,执行脚本会不会导致数据库崩溃,不然对于我这种小白,也不敢轻易执行脚本,谢谢解答。
A4.对数据库时刻保持敬畏之心,这个DBA特质很好👍。
对于不熟悉的脚本需要先阅读理解,是否还包含DDL?即使是select对于同样一些X$
的table,同样也是危险的。
如 X$KSMSP
, 如Tanel在其脚本ksmsp.sql
中明确
“ WARNING!!! This script will query X$KSMSP, which will cause heavy shared pool latch contention”,
对于v$
, DBA_
开头这类公开的系统对象是相对安全,但是也不排除一些资源消耗如v$lock
, 对于昨晚推荐的脚本基本是v$session v$sesstat v$session_event ,ASH,xxx_time_model, xxx_METRIC , DBA_hist
等,非极高频率的查询都是安全的。
Q5. 老师哪些场景要看diag0这些trace呢?平时只看alter日志,lgwr trace
A5.dia0进程是HM(Hang manager)的后台进程,从Oracle11G引入。
Hang Manager会自动解决挂起问题,并将解决方案连续记录在db alert log中, 同时也会将诊断记录在跟踪文件xxx_dia0_xx.trc。
以上的解决方案帮助你在“出现系统级性能问题或数据库重启时”参考。
如该日志会记录一些系统负载如IO、swap, cpu load, memory,active session等信息和当前hang session信息, 通常会在DB alert log中记录如ora-32701 类的错误提示,此类场景就可以关注diag trace.
同时:DB alert log可以做为入手点,如我在直播演示时,PDB无法打开的问题,从DB alert中也给出了解答,那就是:空间耗尽。
👇给一个示例:
Choose the home from which to view the alert log:
1: diag/rdbms/anbob19c/anbob19c
2: diag/rdbms/anbob20c/anbob20c
3: diag/clients/user_oracle/host_522798745_110
4: diag/tnslsnr/oel7db1/listener
Q: to quit
Please select option: 1
Output the results to file: /tmp/alert_2265_14023_anbob19c_1.ado
2020-03-20 05:41:09.498000 -04:00
Starting ORACLE instance (normal) (OS id: 3098)
****************************************************
2021-04-15 22:06:19.329000 -04:00
End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
2021-04-15 22:06:28.568000 -04:00
TT00 (PID:1918): Stuck archiver condition cleared
2021-04-15 22:06:32.557000 -04:00
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_arc3_1928.trc:
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device
Additional information: 4
Additional information: 311296
Additional information: 4294967295
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_arc3_1928.trc:
ORA-19502: write error on file "/u01/app/oracle/product/19.2.0/db_1/dbs/arch/1_77_1035524606.dbf", block number 311296 (block size=512)
ORA-27072: File I/O error
Linux-x86_64 Error: 28: No space left on device 《《《《
Additional information: 4
Additional information: 311296
Additional information: 4294967295
ARC3 (PID:1928): Error 19502 Closing archive log file '/u01/app/oracle/product/19.2.0/db_1/dbs/arch/1_77_1035524606.dbf'
ARC3 (PID:1928): Stuck archiver: inactive mandatory LAD:1
ARC3 (PID:1928): Stuck archiver condition declared
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_arc3_1928.trc:
ORA-16038: log 2 sequence# 77 cannot be archived
ORA-19502: write error on file "", block number (block size=)
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/ANBOB19C/redo02.log'
ARC3 (PID:1928): Archival error occurred on a closed thread, archiver continuing
ORACLE Instance anbob19c, archival error, archiver continuing
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_mz00_2287.trc:
ORA-01110: data file 9: '/u01/app/oracle/oradata/ANBOB19C/pdb1/system01.dbf'
ORA-01208: data file is an old version - not accessing current version
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_mz00_2287.trc:
ORA-01110: data file 10: '/u01/app/oracle/oradata/ANBOB19C/pdb1/sysaux01.dbf'
ORA-01208: data file is an old version - not accessing current version
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_mz00_2287.trc:
ORA-01110: data file 11: '/u01/app/oracle/oradata/ANBOB19C/pdb1/undotbs01.dbf'
ORA-01208: data file is an old version - not accessing current version
2021-04-15 22:06:33.852000 -04:00
Errors in file /u01/app/oracle/diag/rdbms/anbob19c/anbob19c/trace/anbob19c_mz00_2287.trc:
ORA-01110: data file 12: '/u01/app/oracle/oradata/ANBOB19C/pdb1/users01.dbf'
[oracle@oel7db1 trace]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 1868268 0 1868268 0% /dev
tmpfs 1882692 0 1882692 0% /dev/shm
tmpfs 1882692 8716 1873976 1% /run
tmpfs 1882692 0 1882692 0% /sys/fs/cgroup
/dev/mapper/ol_oel7db1-root 39372000 39371968 32 100% / <<<<<<<< File system FULL
/dev/sda1 1038336 184940 853396 18% /boot
/dev/sdb1 12253320 1685136 9922708 15% /opensource
/dev/mapper/ol_oel7db1-home 19220480 1005748 18214732 6% /home
tmpfs 376540 0 376540 0% /run/user/0
另外像你提到的lgwr trace,还有dbw*, pmon*, lmhb* 分析系统级资源负载、性能故障时都可以参考。
Q6. 老师,问下使用dash_wait_chains.sql查看数据库阻塞链,发现最终阻塞源头是一个idle blocker,这个是什么意思?接下来怎么排查?
A6.这样我用个样例说明一下吧。
session 1,2,3,4 都执行 "update anbob.test set name='test' where id=1",制造一下更新相同行的tx等待,mode=6同样可以证明这一点. session 1更新完后uncommit(未提交), 我这里使用了sleep 所以blocker 一直是active ,并且event是IDLE class :PL/SQL lock timer,如果不提交的会话什么也不做event是"SQL*Net message from client"。
[oracle@oel7db1 ~]$ cat b.sh
. ~/.bash_profile
sqlplus anbob/anbob@cdb1pdb1 << EOF
update anbob.test set name='test' where id=1;
exec dbms_lock.sleep(100000);
EOF
[oracle@oel7db1 ~]$nohup sh b.sh &
[oracle@oel7db1 ~]$nohup sh b.sh &
[oracle@oel7db1 ~]$nohup sh b.sh &
[oracle@oel7db1 ~]$nohup sh b.sh &
SQL> @ase
USERNAME SID EVENT MACHINE MODULE STATUS LC_ET SQL_ID WAI_SECINW ROW_WAIT_OBJ# SQLTEXT BS CH# OSUSER HEX
---------- --- -------------------- ---------- ----------- -------- ---- --------------- ---------- ------------- ------------------------------ ------- ---- ---------- ---------
ANBOB 76 enq: TX - row lock c oel7db1 SQL*Plus ACTIVE 26 g48uufy4xd5fq 0:26 74251 update anbob.test set name='te 1:119 0 oracle 1000003
ANBOB 66 enq: TX - row lock c oel7db1 SQL*Plus ACTIVE 27 g48uufy4xd5fq 0:28 74251 update anbob.test set name='te 1:119 0 oracle 1000002
ANBOB 72 enq: TX - row lock c oel7db1 SQL*Plus ACTIVE 29 g48uufy4xd5fq 0:29 74251 update anbob.test set name='te 1:119 0 oracle 1000001
ANBOB 119 PL/SQL lock timer oel7db1 SQL*Plus ACTIVE 30 3bp59a816a12v 0:30 74251 BEGIN dbms_lock.sleep(100000); : 0 oracle 1000000
CURRENT_TIME
-------------------
2021-04-15 22:27:31
SQL> @snapper ash 5 1 all
Sampling SID all with interval 5 seconds, taking 1 snapshots...
-- Session Snapper v4.31 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
---------------------------------------------------------------------------------------------------------------
ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
---------------------------------------------------------------------------------------------------------------
3.00 (300%) | 1 | g48uufy4xd5fq | 0 | enq: TX - row lock contention | Application
.03 (3%) | 1 | | | oracle thread bootstrap | Other
.03 (3%) | 1 | | 0 | oradebug request completion | Other
.03 (3%) | 1 | bmcj2k06ncg3y | 0 | ON CPU | ON CPU
-- End of ASH snap 1, end=2021-04-15 22:28:44, seconds=5, samples_taken=39, AAS=3.1
PL/SQL procedure successfully completed.
SQL> @ashtop wait_class,event,sql_id 1=1 &5min
Total Distinct
Seconds AAS %This WAIT_CLASS EVENT SQL_ID FIRST_SEEN LAST_SEEN Execs Seen
--------- ------- ------- --------------- ------------------------------------------ --------------- ------------------- ------------------- ----------
382 1.3 94% | Application enq: TX - row lock contention g48uufy4xd5fq 2021-04-15 22:27:01 2021-04-15 22:29:13 3
4 .0 1% | User I/O db file sequential read cx79k675ftmt0 2021-04-15 22:26:19 2021-04-15 22:26:23 1
2 .0 0% | User I/O db file parallel read at8hsv5ddjgfp 2021-04-15 22:26:01 2021-04-15 22:26:03 1
2 .0 0% | a01hp0psv0rrh 2021-04-15 22:26:07 2021-04-15 22:26:08 1
...
15 rows selected.
SQL> @ash/ash_wait_chains event2||program 1=1 sysdate-10/24/60 sysdate
-- Display ASH Wait Chain Signatures script v0.6 BETA by Tanel Poder ( http://blog.tanelpoder.com )
%This SECONDS AAS WAIT_CHAIN FIRST_SEEN LAST_SEEN
------ ---------- ------- --------------------------------------------------------------------------------------------------------------------------------------
96% 580 1.0 -> enq: TX - row lock contention [mode=6] sqlplus@oel7db1 (TNS V1-V3) -> [idle blocker 1,119,1391 (sqlplus@oel7db1 (TNS V1-V3))] 2021-04-15 22:27:01 2021-04-15 22:30:20
1% 9 .0 -> ON CPU sqlplus@oel7db1 (TNS V1-V3) 2021-04-15 22:23:41 2021-04-15 22:28:38
1% 5 .0 -> db file sequential read sqlplus@oel7db1 (TNS V1-V3) 2021-04-15 22:26:19 2021-04-15 22:28:39
0% 2 .0 -> db file parallel read sqlplus@oel7db1 (TNS V1-V3) 2021-04-15 22:26:01 2021-04-15 22:26:03
0% 1 .0 -> ON CPU oracle@oel7db1 (CJQ0) 2021-04-15 22:26:27 2021-04-15 22:26:27
0% 1 .0 -> db file sequential read oracle@oel7db1 (J001) 2021-04-15 22:26:30 2021-04-15 22:26:30
0% 1 .0 -> ON CPU oracle@oel7db1 (MZ00) 2021-04-15 22:26:29 2021-04-15 22:26:29
0% 1 .0 -> db file scattered read sqlplus@oel7db1 (TNS V1-V3) 2021-04-15 22:27:00 2021-04-15 22:27:00
0% 1 .0 -> sort segment request oracle@oel7db1 (J002) 2021-04-15 22:26:31 2021-04-15 22:26:31
0% 1 .0 -> enq: JG - queue lock [mode=6] oracle@oel7db1 (J001) -> sort segment request oracle@oel7db1 (J002) 2021-04-15 22:26:31 2021-04-15 22:26:31
0% 1 .0 -> DLM cross inst call completion sqlplus@oel7db1 (TNS V1-V3) -> [idle blocker 1,19,28331 (oracle@oel7db1 (DBW0))] 2021-04-15 22:26:06 2021-04-15 22:26:06
0% 1 .0 -> control file parallel write sqlplus@oel7db1 (TNS V1-V3) 2021-04-15 22:26:00 2021-04-15 22:26:00
0% 1 .0 -> db file sequential read oracle@oel7db1 (J002) 2021-04-15 22:26:30 2021-04-15 22:26:30
13 rows selected.
SQL> @sed timer
Show wait event descriptions matching %timer%..
EVENT# EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ENQUEUE_NAME REQ_REASON REQ_DESCRIPTION
------ ------------------------------------------------------- -------------------- ------------------------- ------------------------- ------------------------- ------------------------------ -------------------------------- ----------------------------------------------------------------------------------------------------
494 ASM background timer Idle
137 DBWR timer Idle
148 Data Guard: Timer Idle
97 LMS CR slave timer Idle type
459 PL/SQL lock timer Idle duration
...
18 rows selected.
SQL> @lock sid=119
SID TY LMODE MODE_HELD REQUEST MODE_REQUESTED LOCK_ID1 LOCK_ID2 CTIME BLOCK
------- -- ---------- --------------- ---------- --------------- ---------- ---------- ---------- ----------
119 TX 6 Exclusive 0 None 131076 1579 1391 1
119 TM 3 Row-X (SX) 0 None 74251 0 1391 0
119 AE 4 Share 0 None 134 2777603868 1392 0
SQL> @oid 74251
owner object_name object_type SUBOBJECT_NAME CREATED LAST_DDL_TIME status DATA_OBJECT_ID
------------------------- ------------------------------ ------------------ ------------------------------ ------------------- ------------------- --------- --------------
ANBOB TEST TABLE 2021-02-01 04:25:53 2021-02-01 04:28:34 VALID 74253
SQL> select
2 s.sid
3 , s.serial#
4 , s.username trans_username
5 , t.addr taddr
6 , s.saddr ses_addr
7 , t.used_ublk
8 , t.used_urec
9 -- , t.start_time
10 , to_char(t.flag, 'XXXXXXXX') "0xFLAG"
11 , t.status||CASE WHEN BITAND(t.flag,128) = 128 THEN ' ROLLING BACK' END status
12 , t.start_date
13 , XIDUSN
14 , XIDSLOT
15 , XIDSQN
16 , t.xid
17 , t.prv_xid
18 , t.ptx_xid
19 from
20 v$session s
21 , v$transaction t
22 where
23 s.saddr = t.ses_addr
24* and sid=119
SID SERIAL# USERNAME TADDR SES_ADDR USED_UBLK USED_UREC 0xFLAG STATUS START_DATE XIDUSN XIDSLOT XIDSQN XID PRV_XID PTX_XID
---------- ---------- ------------------------- ---------------- ---------------- ---------- ---------- --------- ----------------------------- ------------------- ---------- ---------- ---------- ---------------- ---------------- ----------------
119 1391 ANBOB 0000000074C065D8 00000000781A7518 1 1 E03 ACTIVE 2021-04-15 22:27:00 2 4 1579 020004002B060000 0000000000000000 0000000000000000
SQL> select SAMPLE_TIME,session_id,session_serial#,sql_id,sql_opname from v$active_session_history where xid=hextoraw('020004002B060000') ;
SAMPLE_TIME SESSION_ID SESSION_SERIAL# SQL_ID SQL_OPNAME
--------------------------- - ---------- --------------- --------------- --------------
15-APR-21 10.27.00.942 PM 119 1391 g48uufy4xd5fq UPDATE
SQL> @sqlid g48uufy4xd5fq %
Show SQL text, child cursors and execution stats for SQLID g48uufy4xd5fq child %
HASH_VALUE PLAN_HASH_VALUE CH# SQL_TEXT
---------- --------------- ----- --------------------------------------------
2312541654 839355234 0 update anbob.test set name='test' where id=1
SQL> select a.sid,a.serial#,c.object_name ,event, blocking_session, a.last_call_et
2 from V$session a,
3 V$locked_object b,
4 dba_objects c
5 where a.sid=b.session_id
6 and b.object_id=c.object_id
7* and c.owner = a.username
SID SERIAL# OBJECT_NAME EVENT BLOCKING_SESSION LAST_CALL_ET
---------- ---------- ------------------------------ -------------------- ---------------- ------------
72 19406 TEST enq: TX - row lock c 119 2666
76 7570 TEST enq: TX - row lock c 119 2663
66 31 TEST enq: TX - row lock c 119 2664
119 1391 TEST PL/SQL lock timer 2667
总结: 对于“blocker session做完事务以后没有提交,并且什么也没做或者执行了其它SQL, 同时又IDLE”的情况下,在检查block chains堵塞链时显示:blocker session就是IDLE。
这些表明确实有未提交的事务,如同上例可以从v$lock
或v$locked_objectd
确认,接下来就可以找到那个blocker通知提交,或者KILL。
对于定位blocker 曾经执行的事务SQL,可以尝试从v$session
的sql_id 、PREV_SQL_ID出发解决,也可以从:v$open_cursor
的sid,v$active_session_history
的xid, session_id 去查询SQL。
如果幸运的话是可以找到该SQL文本。
Q7. 老师 请问 这些脚本执行的结果 有第三方工具可以直接看吗?感觉执行命令比较繁琐。
A7.昨天推荐的脚本包就是为了简化使用,在输出上更佳有可读性体验,我平时数据库运维基本上都是命令行到sqlplus,然后执行预写的sql脚本,相信你用习惯了就会爱上它。
👇列几个最常用的:
@xi [sql_id] % 查看执行计划
@xwar
@xib
@sqlid 查看sql表细
@seg xx.xx 查看段大小
@tab
@ind
@o xx.xx
@oid
@seq
@privs [user ] 查看权限
@usid [sid] session信息
@uspid [spid]
@sw [sid]
@se v$session_event
@ses v$sesstat
@ase 活动会话
@ashtop wait_class,event,sql_id session_type='FOREGROUND' &5min 5分钟内TOP SQL
@t 当前trace file 文件名
@46on 12 10046 event level 12
@46off
@help % 会列出Tanel一会粉丝整理的帮助
直播资源
视频回放:Oracle诊断性能问题时常用脚本工具
学习课件:《Oracle诊断性能问题时常用脚本工具 - 张维照》
脚本分享
更多学习视频可进入《课程合集》观看