一位客户的Exadata数据库在大量数据处理期间经常出现部分SQL性能急剧下降的故障。正常执行只需要几分钟,但在故障时需要执行1小时至数小时。SQL异常会导致数据不能及时处理,严重影响了数据统计的及时性。
出现性能问题的SQL语句具体如下(由于SQL语句较长,仅列出了关键代码)。
create table TBAS.tmp_2_o_user_2_xxxx compress for query high nologging parallel as select /*+parallel (8) */ t1.ACC_NBR, …. case when traffic_wlan/1024 > 20480 then 32 when t42.traffic_fund_type_id is not null then t42.traffic_fund_type_id else (select traffic_fund_type_id from pu_meta_dim.d_traffic_fund_type t where state = 1 andtraffic_wlan/1024 < t.max_value and traffic_wlan/1024 > t.min_value) end MBL_wlan_FUND_TYPE_ID, …. from TBAS.tmp_1_o_user_2_xxx t1…
where关键字后面的代码通常很简单,甚至没有where条件。
分析SQL问题一般会从AWR报告、SQL的执行统计信息和SQL执行计划着手。
AWR报告
下列是检查故障时间段(比如:2014年10月15日11:00至12:00)的AWR报告。
2. Snap Id Snap Time Sessions Curs/Sess 3. --------- ------------------- -------- --------- 4. Begin Snap: 19332 05-Oct-14 11:00:18 257 1.8 5. End Snap: 19333 05-Oct-14 12:00:18 254 1.9 6. Elapsed: 59.99 (mins) 7. DB Time: 943.03 (mins) 8. 9. Cache Sizes Begin End 10. ~~~~~~~~~~~ ---------- ---------- 11. Buffer Cache: 19,456M 19,456M Std Block Size: 8K 12. Shared Pool Size: 11,264M 11,264M Log Buffer: 227,180K 13. 14. Load Profile Per Second Per Transaction Per Exec Per Call 15. ~~~~~~~~~~~~ --------------- --------------- ---------- ---------- 16. DB Time(s): 15.7 4.6 0.01 0.00 17. DB CPU(s): 6.6 1.9 0.00 0.00 18. Redo size: 441,825.6 127,927.9 19. Logical reads: 77,377.6 22,404.2 20. Block changes: 419.4 121.4 21. Physical reads: 9,898.7 2,866.1 22. Physical writes: 2,049.4 593.4 23. User calls: 9,804.6 2,838.9 24. Parses: 1,970.4 570.5 25. Hard parses: 0.7 0.2 26. W/A MB processed: 9.0 2.6 27. Logons: 1,959.8 567.4 28. Executes: 1,969.7 570.3 29. Rollbacks: 0.0 0.0 30. Transactions: 3.5 31.
从上面的数据可以看到,平均每秒Logons数达到1959次。这对于任何一个Oracle数据库来说,都是异常高的值,正常数据库的每秒Logon数都在30以下。出现这种情况的最可能的原因是空闲的并行进程加入到了会话当中,即数据库中频繁地发生大量的并发会话的启动和释放。
同时平均每秒1970次解析(parse)对数据仓库类应用来说也显得异常得高。
生成SQL报告
下列是生成故障SQL的SQL报告。
33. SQL ID: fym9dtwdyjpna DB/Inst: ORCLNEW/orclnew2 Snaps: 19321-19337 34. -> 1st Capture and Last Capture Snap IDs 35. refer to Snapshot IDs witin the snapshot range 36. -> create table TBAS.tmp_2_o_user_2_xxxx compress for query high nologgin... 37. 38. Plan Hash Total Elapsed 1st Capture Last Capture 39. # Value Time(ms) Executions Snap ID Snap ID 40. --- ---------------- ---------------- ------------- ------------- -------------- 41. 1 3675582523 12,699,298 0 19332 19337 42. ------------------------------------------------------------- 43. 44. Plan 1(PHV: 3675582523) 45. ----------------------- 46. 47. Plan Statistics DB/Inst: ORCLNEW/orclnew2 Snaps: 19321-19337 48. -> % Total DB Time is the Elapsed Time of the SQL statement divided 49. into the Total Database Time multiplied by 100 50. 51. Stat Name Statement Per Execution % Snap 52. ------------------------------- ---------- -------------- ------- 53. Elapsed Time (ms) 1.2699E+07 N/A 0.9 54. CPU Time (ms) 6,176,679 N/A 0.9 55. Executions 0 N/A N/A 56. Buffer Gets 1.6908E+07 N/A 0.2 57. Disk Reads 1,378,166 N/A 0.1 58. Parse Calls 9,875,656 N/A 23.8 59. Rows 0 N/A N/A 60. User I/O Wait Time (ms) 1,707,961 N/A N/A 61. Cluster Wait Time (ms) 55 N/A N/A 62. Application Wait Time (ms) 840,484 N/A N/A 63. Concurrency Wait Time (ms) 746,191 N/A N/A 64. Invalidations 1 N/A N/A 65. Version Count 57 N/A N/A 66. Sharable Mem(KB) 28,536 N/A N/A 67. ------------------------------------------------------------- 68. Execution Plan 69. -------------------------------------------------------------------------------- 70. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| 71. -------------------------------------------------------------------------------- 72. | 0 | CREATE TABLE STATEMENT | | | | 17915 (100)| | | 73. | 1 | PX COORDINATOR | | | | | | | 74. | 2 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 52 | 2 (0)| 00:00:01 | | 75. | 3 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 76. | 4 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_CHARGE_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 77. | 5 | PX COORDINATOR | | | | | | | 78. | 6 | PX SEND QC (RANDOM) | :TQ20000 | 1 | 52 | 2 (0)| 00:00:01 | | 79. | 7 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 80. | 8 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_CHARGE_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 81. | 9 | PX COORDINATOR | | | | | | | 82. | 10 | PX SEND QC (RANDOM) | :TQ30000 | 1 | 52 | 2 (0)| 00:00:01 | | 83. | 11 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 84. | 12 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_CHARGE_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 85. | 13 | PX COORDINATOR | | | | | | | 86. | 14 | PX SEND QC (RANDOM) | :TQ40000 | 1 | 52 | 2 (0)| 00:00:01 | | 87. | 15 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 88. | 16 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_TRAFFIC_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 89. | 17 | PX COORDINATOR | | | | | | | 90. | 18 | PX SEND QC (RANDOM) | :TQ50000 | 1 | 52 | 2 (0)| 00:00:01 | | 91. | 19 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 92. | 20 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_TRAFFIC_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 93. | 21 | PX COORDINATOR | | | | | | | 94. | 22 | PX SEND QC (RANDOM) | :TQ60000 | 1 | 52 | 2 (0)| 00:00:01 | | 95. | 23 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 96. | 24 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_TRAFFIC_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 97. | 25 | PX COORDINATOR | | | | | | | 98. | 26 | PX SEND QC (RANDOM) | :TQ70000 | 1 | 52 | 2 (0)| 00:00:01 | | 99. | 27 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 100. | 28 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_TRAFFIC_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 101. | 29 | PX COORDINATOR | | | | | | | 102. | 30 | PX SEND QC (RANDOM) | :TQ80000 | 1 | 52 | 2 (0)| 00:00:01 | | 103. | 31 | PX BLOCK ITERATOR | | 1 | 52 | 2 (0)| 00:00:01 | | 104. | 32 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_TRAFFIC_FUND_TYPE | 1 | 52 | 2 (0)| 00:00:01 | | 105. | 33 | PX COORDINATOR | | | | | | | 106. | 34 | PX SEND QC (RANDOM) | :TQ90009 | 2088K| 4519M| 2839 (2)| 00:00:35 | | 107. | 35 | LOAD AS SELECT | | | | | | | 108. | 36 | HASH JOIN RIGHT OUTER | | 2088K| 4519M| 2839 (2)| 00:00:35 | | 109. | 37 | PX RECEIVE | | 44400 | 1127K| 2 (0)| 00:00:01 | | 110. | 38 | PX SEND BROADCAST | :TQ90000 | 44400 | 1127K| 2 (0)| 00:00:01 | | 111. | 39 | PX BLOCK ITERATOR | | 44400 | 1127K| 2 (0)| 00:00:01 | 34 | 112. | 40 | TABLE ACCESS STORAGE FULL | WT_FINANCIAL_CARD | 44400 | 1127K| 2 (0)| 00:00:01 | 34 | 113. | 41 | HASH JOIN RIGHT OUTER | | 2088K| 4467M| 2835 (2)| 00:00:35 | | 114. | 42 | PX RECEIVE | | 25 | 975 | 2 (0)| 00:00:01 | | 115. | 43 | PX SEND BROADCAST | :TQ90001 | 25 | 975 | 2 (0)| 00:00:01 | | 116. | 44 | PX BLOCK ITERATOR | | 25 | 975 | 2 (0)| 00:00:01 | | 117. | 45 | TABLE ACCESS STORAGE FULL | D_CHARGE_FUND_TYPE | 25 | 975 | 2 (0)| 00:00:01 | | 118. | 46 | HASH JOIN RIGHT OUTER | | 2088K| 4389M| 2832 (2)| 00:00:34 | | 119. | 47 | PX RECEIVE | | 25 | 975 | 2 (0)| 00:00:01 | | 120. | 48 | PX SEND BROADCAST | :TQ90002 | 25 | 975 | 2 (0)| 00:00:01 | | 121. | 49 | PX BLOCK ITERATOR | | 25 | 975 | 2 (0)| 00:00:01 | | 122. | 50 | TABLE ACCESS STORAGE FULL | D_CHARGE_FUND_TYPE | 25 | 975 | 2 (0)| 00:00:01 | | 123. | 51 | HASH JOIN RIGHT OUTER | | 2088K| 4312M| 2829 (2)| 00:00:34 | | 124. | 52 | PX RECEIVE | | 25 | 975 | 2 (0)| 00:00:01 | | 125. | 53 | PX SEND BROADCAST | :TQ90003 | 25 | 975 | 2 (0)| 00:00:01 | | 126. | 54 | PX BLOCK ITERATOR | | 25 | 975 | 2 (0)| 00:00:01 | | 127. | 55 | TABLE ACCESS STORAGE FULL | D_CHARGE_FUND_TYPE | 25 | 975 | 2 (0)| 00:00:01 | | 128. | 56 | HASH JOIN RIGHT OUTER | | 2088K| 4234M| 2826 (2)| 00:00:34 | | 129. | 57 | PX RECEIVE | | 16 | 832 | 2 (0)| 00:00:01 | | 130. | 58 | PX SEND BROADCAST | :TQ90004 | 16 | 832 | 2 (0)| 00:00:01 | | 131. | 59 | PX BLOCK ITERATOR | | 16 | 832 | 2 (0)| 00:00:01 | | 132. | 60 | TABLE ACCESS STORAGE FULL | D_TRAFFIC_FUND_TYPE | 16 | 832 | 2 (0)| 00:00:01 | | 133. | 61 | HASH JOIN RIGHT OUTER | | 2088K| 4131M| 2822 (1)| 00:00:34 | | 134. | 62 | PX RECEIVE | | 16 | 832 | 2 (0)| 00:00:01 | | 135. | 63 | PX SEND BROADCAST | :TQ90005 | 16 | 832 | 2 (0)| 00:00:01 | | 136. | 64 | PX BLOCK ITERATOR | | 16 | 832 | 2 (0)| 00:00:01 | | 137. | 65 | TABLE ACCESS STORAGE FULL | D_TRAFFIC_FUND_TYPE | 16 | 832 | 2 (0)| 00:00:01 | | 138. | 66 | HASH JOIN RIGHT OUTER | | 2088K| 4027M| 2819 (1)| 00:00:34 | | 139. | 67 | PX RECEIVE | | 16 | 832 | 2 (0)| 00:00:01 | | 140. | 68 | PX SEND BROADCAST | :TQ90006 | 16 | 832 | 2 (0)| 00:00:01 | | 141. | 69 | PX BLOCK ITERATOR | | 16 | 832 | 2 (0)| 00:00:01 | | 142. | 70 | TABLE ACCESS STORAGE FULL | D_TRAFFIC_FUND_TYPE | 16 | 832 | 2 (0)| 00:00:01 | | 143. | 71 | HASH JOIN RIGHT OUTER | | 2088K| 3923M| 2816 (1)| 00:00:34 | | 144. | 72 | PX RECEIVE | | 16 | 832 | 2 (0)| 00:00:01 | | 145. | 73 | PX SEND BROADCAST | :TQ90007 | 16 | 832 | 2 (0)| 00:00:01 | | 146. | 74 | PX BLOCK ITERATOR | | 16 | 832 | 2 (0)| 00:00:01 | | 147. | 75 | TABLE ACCESS STORAGE FULL | D_TRAFFIC_FUND_TYPE | 16 | 832 | 2 (0)| 00:00:01 | | 148. | 76 | HASH JOIN RIGHT OUTER | | 2088K| 3820M| 2813 (1)| 00:00:34 | | 149. | 77 | PX RECEIVE | | 16 | 832 | 2 (0)| 00:00:01 | | 150. | 78 | PX SEND BROADCAST | :TQ90008 | 16 | 832 | 2 (0)| 00:00:01 | | 151. | 79 | PX BLOCK ITERATOR | | 16 | 832 | 2 (0)| 00:00:01 | | 152. | 80 | TABLE ACCESS STORAGE FULL| D_TRAFFIC_FUND_TYPE | 16 | 832 | 2 (0)| 00:00:01 | | 153. | 81 | PX BLOCK ITERATOR | | 2088K| 3716M| 2809 (1)| 00:00:34 | | 154. | 82 | TABLE ACCESS STORAGE FULL | TMP_1_O_USER_2_0817 | 2088K| 3716M| 2809 (1)| 00:00:34 | | 155. -------------------------------------------------------------------------------- 156. 157. Note 158. ----- 159. - dynamic sampling used for this statement (level=4) 160. - automatic DOP: Computed Degree of Parallelism is 8
可以看到,这条SQL语句执行了12699秒(即3个半小时)还没有执行完成,这条SQL语句产生的parse calls高达987万以上。
从SQL执行时间上看,CPU时间、IO时间、集群等待时间和并行争用等待时间的合计远小于SQL执行总时间。这种情况下通常是由于CPU利用率过高,执行SQL的进程不能得到足够的SQL,或者是由于执行SQL的进程在等待“空闲”事件,没有计入到等待时间中。
从执行计划上看,SQL是并行执行的执行计划,同时在SELECT字段列表部分有子查询存在,并且子查询同样是并行执行计划。
检查历史数据
检查SQL执行的话话历史数据的代码如下所示。
SYS@orclnew2> select sample_id,to_char(min(sample_time),'yyyy-mm-dd hh24:mi:ss') sample_time,instance_number,QC_SESSION_ID,event,count(*) 2 from dba_hist_active_sess_history a 3 where a.sample_time>=to_date('2014-10-05 09:00','yyyy-mm-dd hh24:mi:ss') 4 and SQL_id='fym9dtwdyjpna' 5 group by sample_id,event,instance_number,QC_SESSION_ID 6 order by 1 ; SAMPLE_ID SAMPLE_TIME INSTANCE_NUMBER QC_SESSION_ID EVENT COUNT(*) ----------- ----------------------- ------------- ---------------------- ----------- 69710880 2014-10-05 10:24:31 2 406 1 69710890 2014-10-05 10:24:41 2 406 1 69710910 2014-10-05 10:25:01 2 406 5 69710920 2014-10-05 10:25:11 2 406 2 69710930 2014-10-05 10:25:21 2 406 1 69710940 2014-10-05 10:25:31 2 406 PX Deq: Signal ACK RSG 1 69710940 2014-10-05 10:25:31 2 406 2 69710950 2014-10-05 10:25:41 2 406 6 69710960 2014-10-05 10:25:51 2 406 PX Deq: Signal ACK RSG 1 69710970 2014-10-05 10:26:01 2 406 PX Deq: Signal ACK RSG 1 69710970 2014-10-05 10:26:01 2 406 1 69710980 2014-10-05 10:26:11 2 406 PX Deq: Signal ACK RSG 1 69710980 2014-10-05 10:26:11 2 406 2 69710990 2014-10-05 10:26:21 2 406 PX Deq: Signal ACK RSG 1 69710990 2014-10-05 10:26:21 2 406 3 69711000 2014-10-05 10:26:31 2 406 PX Deq: Slave Session Stats 1 69711010 2014-10-05 10:26:41 2 406 PX Deq: Signal ACK RSG 1 69711010 2014-10-05 10:26:41 2 406 3 69711020 2014-10-05 10:26:51 2 406 PX Deq: Signal ACK RSG 1 69711020 2014-10-05 10:26:51 2 406 3 69711030 2014-10-05 10:27:01 2 406 1 69711040 2014-10-05 10:27:11 2 406 PX Deq: Signal ACK RSG 1 69711040 2014-10-05 10:27:11 2 406 1 69711050 2014-10-05 10:27:21 2 406 3 69711060 2014-10-05 10:27:31 2 406 PX Deq: Signal ACK EXT 1 69711080 2014-10-05 10:27:51 2 406 1 69711090 2014-10-05 10:28:01 2 406 enq: KO - fast object checkpoint 1 69711100 2014-10-05 10:28:11 2 406 cell smart table scan 1 69711110 2014-10-05 10:28:21 2 406 PX Deq: Slave Session Stats 1 69711120 2014-10-05 10:28:31 2 406 1 69711130 2014-10-05 10:28:41 2 406 1 69711140 2014-10-05 10:28:51 2 406 1 69711150 2014-10-05 10:29:01 2 406 PX Deq: Signal ACK RSG 1 69711150 2014-10-05 10:29:01 2 406 1 69711160 2014-10-05 10:29:11 2 406 PX Deq: Signal ACK RSG 1 69711160 2014-10-05 10:29:11 2 406 3 69711170 2014-10-05 10:29:21 2 406 cell smart table scan 1 69711180 2014-10-05 10:29:31 2 406 cell smart table scan 1 69711190 2014-10-05 10:29:41 2 406 1 69711200 2014-10-05 10:29:51 2 406 1 69711210 2014-10-05 10:30:01 2 406 cell smart table scan 1 69711220 2014-10-05 10:30:12 2 406 3 69711230 2014-10-05 10:30:22 2 406 enq: KO - fast object checkpoint 1 69711240 2014-10-05 10:30:32 2 406 1 69711250 2014-10-05 10:30:42 2 406 1 69711260 2014-10-05 10:30:52 2 406 cell smart table scan 1 69711270 2014-10-05 10:31:02 2 406 cell smart table scan 1 69711280 2014-10-05 10:31:12 2 406 1 69711290 2014-10-05 10:31:22 2 406 reliable message 1 69711300 2014-10-05 10:31:32 2 406 1 69711310 2014-10-05 10:31:42 2 406 PX Deq: Signal ACK RSG 1 69711310 2014-10-05 10:31:42 2 406 1 69711320 2014-10-05 10:31:52 2 406 1 69711330 2014-10-05 10:32:02 2 406 cell smart table scan 1 69711350 2014-10-05 10:32:22 2 406 reliable message 1 69711370 2014-10-05 10:32:42 2 406 cell smart table scan 1 69711380 2014-10-05 10:32:52 2 406 cell smart table scan 1 69711390 2014-10-05 10:33:02 2 406 PX Deq: Signal ACK RSG 1 69711390 2014-10-05 10:33:02 2 406 1
从会话的历史数据来看,SQL语句的确是并行执行,但是各个时刻并行进程的数量各不相同。
判断问题产生的流程
由于此SQL语句产生了大量的parse calls,并且在SELECT字段列表部分有子查询,结合SQL执行的会话历史数据,初步判断问题的产生是这样一个流程。
(1)SQL语句执行主体查询部分。
(2)根据主体查询部分返回的数据,执行SELECT字段列表中的标量子查询。
(3)每次执行子查询都进行并行查询。并行查询每次都要启动并行进程、查询数据、释放并行进程这样的过程,这样大量的动作序列使得SQL执行非常缓慢。
(4)新启并行进程会进行SQL的解析,所以产生大量的parse calls。
并行查询表时,执行计划中有提示TABLE ACCESS STORAGE FULL FIRST ROWS,实际执行时也存在cell smart table scan。这表明由存储节点进行数据扫描,在扫描数据之前,需要对表做object checkpoint。会话历史中的enq: KO - fast object checkpoint等待事件表示该表正在进行object checkpoint。
因此,针对此故障,初步判断是语句的SELECT字段列表部分有子查询,在子查询执行时出现存在问题。那么需要验证以下两点。
(1)所有出现此类故障的SQL语句是否都是这一类SQL,即SELECT字段列表部分有标量子查询的SQL。
(2)同一条SQL在故障期间执行时与正常执行时的差别。
12.6.5 查询历史数据
由于故障SQL的parse calls较高,因此可以通过下列代码来查询历史数据。
select snap_id,parse_calls_delta,SQL_id,elapsed_time_delta/1000000 ela,executions_delta exec from dba_hist_SQLstat where parse_calls_delta>=10000 and parse_calls_delta/decode(executions_delta,0,1,null,1,executions_delta)>10 and elapsed_time_delta/1000000>=500 order by 1;
通过这个SQL语句查询出来的结果,可以证实是同一类SQL出现了问题。
并列执行的序列过程
对异常SQL的会话进行trace,会重复出现下列的等待事件。
165. WAIT nam='PX Deq: Join ACK' ela= 643 sleeptime/senderid=268566613 passes=2 p3=36959538792 obj#=13123575 166. WAIT nam='PX Deq: Join ACK' ela= 512 sleeptime/senderid=268566619 passes=5 p3=35004876592 obj#=13123575 167. WAIT nam='PX Deq: Join ACK' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 tim=1412491416551083 168. WAIT nam='PX Deq: Join ACK' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 tim=1412491416551095 169. WAIT nam='PX Deq: Join ACK' ela= 0 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 tim=1412491416551105 170. WAIT nam='PX Deq: Join ACK' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 tim=1412491416551116 171. WAIT nam='PX Deq: Join ACK' ela= 0 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 tim=1412491416551126 172. WAIT nam='PX Deq: Join ACK' ela= 0 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 tim=1412491416551136 173. WAIT nam='PX Deq: Parse Reply' ela= 3801 sleeptime/senderid=200 passes=1 p3=0 obj#=13123575 174. WAIT nam='PX Deq: Parse Reply' ela= 2329 sleeptime/senderid=200 passes=1 p3=0 obj#=13123575 175. WAIT nam='PX Deq: Parse Reply' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 176. WAIT nam='PX Deq: Parse Reply' ela= 514 sleeptime/senderid=200 passes=2 p3=0 obj#=13123575 177. WAIT nam='PX Deq: Parse Reply' ela= 196 sleeptime/senderid=200 passes=1 p3=0 obj#=13123575 178. WAIT nam='PX Deq: Parse Reply' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 179. WAIT nam='PX Deq: Parse Reply' ela= 472 sleeptime/senderid=200 passes=2 p3=0 obj#=13123575 180. WAIT nam='PX Deq: Parse Reply' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 181. WAIT nam='PX Deq: Execute Reply' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 182. WAIT nam='PX Deq: Execute Reply' ela= 36 sleeptime/senderid=200 passes=3 p3=0 obj#=13123575 183. WAIT nam='PX Deq: Signal ACK RSG' ela= 24 sleeptime/senderid=10 passes=2 p3=0 obj#=13123575 184. WAIT nam='PX Deq: Signal ACK RSG' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 185. WAIT nam='PX Deq: Signal ACK RSG' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 186. WAIT nam='PX Deq: Signal ACK RSG' ela= 0 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 187. WAIT nam='PX Deq: Signal ACK RSG' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 188. WAIT nam='PX Deq: Signal ACK RSG' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123575 189. WAIT nam='PX Deq: Signal ACK RSG' ela= 203 sleeptime/senderid=10 passes=3 p3=0 obj#=13123575 190. WAIT nam='PX Deq: Signal ACK RSG' ela= 196 sleeptime/senderid=10 passes=2 p3=0 obj#=13123575 191. WAIT nam='reliable message' ela= 365 channel context=38950850616 channel handle=38549009072 broadcast message=39084394200 obj#=13123575 192. WAIT nam='enq: KO - fast object checkpoint' ela= 95 name|mode=1263468550 2=131327 0=1 obj#=13123575 193. WAIT nam='enq: KO - fast object checkpoint' ela= 102 name|mode=1263468545 2=131327 0=2 obj#=13123575 194. WAIT nam='cell smart table scan' ela= 447 cellhash#=88802347 p2=0 p3=0 obj#=13123562 tim=1412491416561265 195. WAIT nam='cell smart table scan' ela= 498 cellhash#=88802347 p2=0 p3=0 obj#=13123562 tim=1412491416561821 196. WAIT nam='reliable message' ela= 503 channel context=38950850616 channel handle=38549009072 broadcast message=31940845352 obj#=13123562 197. WAIT nam='enq: KO - fast object checkpoint' ela= 174 name|mode=1263468550 2=131327 0=1 obj#=13123562 198. WAIT nam='enq: KO - fast object checkpoint' ela= 100 name|mode=1263468545 2=131327 0=2 obj#=13123562 199. WAIT nam='cell smart table scan' ela= 360 cellhash#=88802347 p2=0 p3=0 obj#=13123562 tim=1412491416563438 200. WAIT nam='cell smart table scan' ela= 612 cellhash#=88802347 p2=0 p3=0 obj#=13123562 tim=1412491416564115 201. WAIT nam='PX Deq: Signal ACK EXT' ela= 247 sleeptime/senderid=200 passes=1 p3=0 obj#=13123562 202. WAIT nam='PX Deq: Signal ACK EXT' ela= 256 sleeptime/senderid=200 passes=2 p3=0 obj#=13123562 203. WAIT nam='PX Deq: Signal ACK EXT' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 204. WAIT nam='PX Deq: Slave Session Stats' ela= 3 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 205. WAIT nam='PX Deq: Slave Session Stats' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 206. WAIT nam='PX Deq: Slave Session Stats' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 207. WAIT nam='PX Deq: Slave Session Stats' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 208. WAIT nam='PX Deq: Slave Session Stats' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 209. WAIT nam='PX Deq: Slave Session Stats' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 210. WAIT nam='PX Deq: Slave Session Stats' ela= 1 sleeptime/senderid=0 passes=0 p3=0 obj#=13123562 211. WAIT nam='PX Deq: Slave Session Stats' ela= 139 sleeptime/senderid=200 passes=3 p3=0 obj#=13123562
一个并行执行的序列过程如下。
(1)启动并行进程。
(2)QC进程(执行SQL的发起进程)向所有并行进程发送JOIN消息,使并行进程加入到Server Group(并行执行同一条SQL的所有并行查询进程加入到Server Group中)。这个时候QC进程会出现PX Deq:Join ACK等待,表示QC进程在等待并行查询进程加入到Server Group。
(3)QC进程向每个查询进程发送SQL语句,让每个并行查询进程解析SQL。这时QC进程出现PX Deq: Parse Reply,等待并行进程确认解析完毕。
(4)QC进程发送要执行的任务给每个并行进程,此时QC进程出现PX Deq: Execute Reply,表示QC进程在待待并行进程执行返回结果。
(5)执行完成后到最后并行进程退出Server Group,并收集并行查询会话的统计信息。此时QC进程出现的等待事件是PX Deq: Sinal ACK EXT和PX Deq: Slave Session Stats。
查询并行涉及的对象如下所示。
SQL> select owner,object_name,object_type from dba_objects where object_id=13123575; OWNER OBJECT_NAME OBJECT_TYPE -------------------- ------------------------------ ------------------- PU_META_DIM D_CHARGE_FUND_TYPE TABLE SQL> select owner,object_name,object_type from dba_objects where object_id=13123562; OWNER OBJECT_NAME OBJECT_TYPE -------------------- ------------------------------ ------------------- PU_META_DIM D_TRAFFIC_FUND_TYPE TABLE
可以看出这些表正是SELECT字段列表中的标量子查询中的表。
检查call stack
检查SQL在故障期间执行时QC进程的call stack(通过oradeBug):
SYS@orclnew2> oradeBug short_stack ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-sendmsg()+16<-sskgxp_sndmsg()+444<-skgxpipost()+979<-skgxppost()+144<-ksxppst_real()+289<-ksxppst_one()+34<-kslpsprns()+521<-kskpthr()+112<-kslpspr()+9<-kxfprienq()+1926<-kxfpqrenq()+235<-kxfxpf()+145<-kxfxcp1()+4146<-kxfxcp()+491<-qerpxSendParse()+1059<-kxfpValidateSlaveGroup()+192<-kxfpgsg()+4707<-kxfrAllocSlaves()+483<-kxfrialo()+2589<-kxfralo()+318<-qerpx_rowsrc_start()+1379<-qerpxStart()+578<-subex1()+138<-subsr3()+183<-evaopn3()+2537<-expepr()+576<-evaand()+59<-expepr()+47<-evacssr()+128<-qerlt_snv()+85<-qerltRop()+320<-kdstf11001010011km()+3284<-kdsttgr()+133176<-qertbFetch()+2525<-qergiFetch()+317<-rwsfcd()+103<-qerltFetch()+599<-qertqoFetch()+952<-qerpxFetch()+11817<-ctcdrv()+13941<-opiexe()+21743<-opiosq0()+3932<-opipls()+11479<-opiodr()+916<-rpidrus()+211<-skgmstack()+148<-rpiswu2()+638<-rpidrv()+1384<-psddr0()+473<-psdnal()+457<-pevm_EXIM()+308<-pfrinstr_EXIM()+53<-pfrrun_no_tool()+63<-pfrrun()+627<-plSQL_run()+649<-peicnt()+301<-kkxexe()+525<-opiexe()+17667<-kpoal8()+2124<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
call stack中,函数的执行顺序是从后往前看。plSQL_run表明SQL是PL/SQL存储过程中的SQL语句,qerpxFetch表明是并行执行(此处要注意的是,Oracle根据执行计划以并行方式开始执行,但是可能会发生执行时没有足够的并行进程,结果降级为串行方式执行的情况)subex1表示在执行子查询,qerpx_rowsrc_start表示这个是真正的并行开始,kxfrAllocSlaves表示分配并行进程,qerpxSendParse表示发送SQL解析请求给并行进程。
call stack跟之前的现象是相符合的,的确是在子查询部分新启动了并行。如果是事先就分配好的并行进程,则不会出现kxfrAllocSlaves这样的函数。反复多次查看QC进程的call stack,都能确认是相同的结果。
检查并行会话
检查SQL异常执行时的并行会话具体如下。
SYS@orclnew2> select sid from v$px_session where qcsid=1044; SID ----------- 851 918 982 280 1044 SYS@orclnew2> / SID ----------- 851 923 982 280 1044 SYS@orclnew2> / SID ----------- 867 919 982 280 1044
可以看到,同一个QC会话对应的并行进程其会话SID会不停地发生变化,这表明并行进程不停地启用和释放,这跟之前的现象、描述以及结论是一致的。这也和AWR报告中的Logons非常高是一致的。
确认了子查询在不停地重启并行进程的问题后,还需要确认同一条SQL语句为什么大多数时候能正常? 并且在故障后中止SQL运行再重新执行又能正常运行?
在故障期间,运行下面的测试代码。
alter session set events 'trace[SQL_Compiler.*] disk=highest'; alter session set events 'trace[SQL_Execution.*] disk=highest'; create table tu_tbas.WT_P_OFFER_2014101802812_XJ compress for query high nologging parallel 64 as SELECT SERV_ID….
检查生成的trace文件,在SQL开始尝试并行执行时,信息如下所示。
2014-10-22 04:58:23.701176*:PX_Messaging:kxfp.c@3325:kxfpqsod(): Query end, buffer cache support for numa enabled: YES 2014-10-22 04:58:23.701176*:PX_Messaging:kxfp.c@3329:kxfpqsod(end): serial - no slave alloc'd for 1st server set 2014-10-22 04:58:23.701176*:PX_Messaging:kxfp.c@10797:kxfpgsg(end): slave group q=(nil) kxfrAllocSlaves [ 30/ 10] actual num slaves alloc'd = 0 (kxfpqcthr) kxfrialo [ 30/ 0] Finish: allocated actual 0 slaves for non-GV query
可以看到,在分配第一组并行进程(1st server set)时,没有并行进程,只能以串行方式运行。
qerpxStart [ 30/ 0] rwsrid:11 pxid:3 qbas:2339749:err:0 pgakid:1 pgadep:0 *** 2014-10-22 04:58:23.702 START no parallel resources ksdftm:958862290 qertqoStart [ 30/ 0]
虽然没有分配到并行进程,但还是要以并行模式来执行SQL。因为SQL执行的入口已经是并行模式,只是在实际执行过程中没有并行进程,只能串行执行了。
在SQL执行过程中,会出现大量的类似下面的trace信息。
2014-10-22 04:58:23.772279*:PX_Granule:kxfr.c@2291:kxfrialo(): Start: building object table (1) Start: building obj table for obj# 0 kxfrExtInfo [ 100/ 0] prefetch part:0 objn:13123568 objd:13123568 seghdr tsn:11 rdba:0x0a80d47 a kxfrExtInfo [ 100/ 0] kxfrFileAdd [ 100/ 0] filenum:42 #nodes:0 arch:255 test hard affinity:0 dumping object objn:13123568 PU_META_DIM.D_STATE so:0x838788d70 mo:0x2b5257602150 flg:4201 (KXFRO_PSC/KXFRO_RID/KXFRO_HINT/) nbparts:1 nfiles:1 nnodes:0 ninst:0 ecnt:1 size(blocks):1 mtfl:0 files for object 1: kfil:42 size:1 nnodes:0 nodes:(nil) naff:65535 partitions of object 1: part:0 (abs:-1 pnum:-1 spnum:-1 objn:-1 objd:13123568 tsn:11) ecnt:1 size:1 nfiles:1 naff:65535 flags: BIT Files in partition 0: kfil:42 size:1 Extent map of partition 0: afn:42 blk:54395 size:1 coalescable:1 No nodes information. kxfrialo [ 100/ 0] DOP trace -- requested thread from best ref obj = 16 (from kxfrIsBestR ef()) … kxfrialo [ 100/ 0] best object 0x838788d70 hgt:0 blks:1 acp:0 nds:1 thr:16 kxfrialo [ 100/ 0] threads requested = 16 (from kxfrComputeThread()) kxfrialo [ 100/ 0] adjusted no. threads = 16 (from kxfrAdjustDOP()) kxfrAllocSlaves [ 100/ 0] DOP trace -- call kxfpgsg to get 16 slaves 2014-10-22 04:58:23.773172*:PX_Messaging:kxfp.c@9868:kxfpgsg(begin): reqthreads=16 height=0 lsize=0 alloc_flg=0x230 2014-10-22 04:58:23.773172*:PX_Messaging:kxfp.c@9941:kxfpgsg(): reqthreads=16 KXFPLDBL/KXFPADPT/ load balancing=on adaptive=off Getting instance info for open group ….. 2014-10-22 04:58:24.436235*:PX_Messaging:kxfp.c@11465:kxfpg1sg(): got 4 servers (sync), errors=0x0 returning Acquired 4 slaves on 1 instances avg height=4 #set=1 qser=2045742594 P011 inst 2 spid 16249 P016 inst 2 spid 16265 P017 inst 2 spid 16267 P018 inst 2 spid 12448 2014-10-22 04:58:24.436235*:PX_Messaging:kxfp.c@10671:kxfpgsg(): Instance(servers): inst=2 #slvs=4
上述trace信息表明,在执行子查询时,在不停地启用并行进程并释放并行进程。并且随着系统负载的不同,每次获得的并行进程数不一样,这跟会话历史信息中检查发现的不同时刻并行进程数不同的现象是一致的。
至此,故障原因已经明确,是由于在SELECT字段列表有子查询的SQL语句,在主体查询以串行方式执行时,子查询SQL以并行方式查询所引起的。
在此,我们小结一下这个问题,这个故障的出现需要同时满足以下两个条件。
(1)SQL语句的SELECT字段列表部分有子查询。类似于:
create table TBAS.tmp_2_o_user_2_xxxx compress for query high nologging parallel as select /*+parallel (8) */ t1.ACC_NBR, …. case when traffic_wlan/1024 > 20480 then 32 when t42.traffic_fund_type_id is not null then t42.traffic_fund_type_id else (select traffic_fund_type_id from pu_meta_dim.d_traffic_fund_type t where state = 1 andtraffic_wlan/1024 < t.max_value and traffic_wlan/1024 > t.min_value) end MBL_wlan_FUND_TYPE_ID, …. from TBAS.tmp_1_o_user_2_xxxx t1…
实例上的并行进程用完,即所有的并行进程全部处于活动状态。
当SELECT字段列表部分带有子查询的SQL执行时,正常情况下,Oracle会发起两组并行进程(parallel slave sets)。一组进程为数据的生产者(比如从表读数据),一组进程为数据的消费者(比如对“生产者”进程传过来的表数据进行汇总排序等)。在有并行进程时,SQL语句运行正常。
在实例的并行进程用完后,由于没有并行进程,因此SQL语句的主体部分就会降级到串行方式运行。但是在提取SELECT字段列表中的子查询数据时,子查询的表需要使用并行进程。这个时候Oracle会尝试以并行方式运行子查询。Oracle对主查询中返回的每一行数据,都要经过“启动并行进程,查询子查询数据,释放并行进程”的过程。这个过程效率非常低,导致SQL语句执行需要非常长的时间。
下面是另一条出现故障的SQL语句的执行计划(为了便于排版,对执行计划做了剪裁)。
------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | ------------------------------------------------------------------------------------ | 0 | CREATE TABLE STATEMENT | | | | | 1 | PX COORDINATOR | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 19 | | 3 | PX BLOCK ITERATOR | | 1 | 19 | |* 4 | TABLE ACCESS STORAGE FULL FIRST ROWS | D_STATE | 1 | 19 | |* 5 | COUNT STOPKEY | | | | | 6 | PX COORDINATOR | | | | | 7 | PX SEND QC (RANDOM) | :TQ20000 | 11 | 209 | |* 8 | COUNT STOPKEY | | | | | 9 | PX BLOCK ITERATOR | | 11 | 209 | |* 10 | TABLE ACCESS STORAGE FULL FIRST ROWS| D_STATE | 11 | 209 | | 11 | PX COORDINATOR | | | | | 12 | PX SEND QC (RANDOM) | :TQ30000 | 14M| 3644M| | 13 | LOAD AS SELECT | | | | | 14 | PX BLOCK ITERATOR | | 14M| 3644M| |* 15 | TABLE ACCESS STORAGE FULL | F_2_OFFER_SERV_D | 14M| 3644M| ------------------------------------------------------------------------------------
上面的执行计划中,第11~15步是SQL主体查询部分,第2~4步是SELECT字段列表中的子查询执行部分。正常情况下,是由两组并行进程负责,一组负责主体查询,然后将结果发送到另一组进程来进行子查询。但是在主体查询部分以串行方式进行时,SQL语句的主进程即QC进程,对F_2_OFFER_SERV_ID返回的每一行数据都要以并行方式执行一次子查询。这种方式的效率极为低下。
针对这样的问题,解决方案有如下几种。
(1)避免SELECT字段列表带有子查询的SQL以串行方式执行。在数据仓库应用中,串行执行的SQL语句,不管何种情况性能都比较差,因此要避免此种情况的产生。具体方法是在并行进程用光时,对并行执行的SQL进行排队。有以下几种方法。
1)将PARALLEL_DEGREE_POLICY参数设置为AUTO,开启自动并行。在系统级设置此参数在没有经过充分测试的情况下有比较大的风险,但是可以考虑在会话级设置此参数。在调度具有故障类SQL语句的脚本中设置会话级参数。
2)在故障类SQL中加上Hint “parallel(auto)”。
3)在故障类SQL中加上Hint “STATEMENT_QUEUING”。
(2)避免SELECT字段列表中使用子查询,尽量改为表关联。或者是针对此类子查询使用Hint no_parallel(表别名)”,但是需要测试此种更改对性能的影响。
在数据仓库应用中,需要控制并行度来避免单条SQL占用过多的并行进程。控制SQL并行度的方法有以下两种(以下两个方法都要同时采用)。
(1)设置自动并行时的并行度限制,这是通过设置参数PARALLEL_DEGREE_LIMIT来实现,比如将此参数设置为8。
(2)通过资源管理限制来限制并行度。
exec dbms_resource_manager.clear_pending_area(); exec dbms_resource_manager.create_pending_area(); exec dbms_resource_manager.create_plan( plan =>'LIMIT_DOP', comment => 'Limit Degree of Parallelism'); exec dbms_resource_manager.create_plan_directive(plan=> 'LIMIT_DOP', group_or_subplan => 'OTHER_GROUPS' , comment => 'limits the parallelism', parallel_degree_limit_p1=> 8); exec dbms_resource_manager.validate_pending_area(); exec dbms_resource_manager.submit_pending_area();
调整前建议进行充分测试,调整后需要密切关注系统的运行。
此外,此类的SQL应该比较多,我们可以通过下列代码查询出现过此类故障的SQL。
select snap_id,parse_calls_delta,SQL_id,elapsed_time_delta/1000000 ela,executions_delta exec from dba_hist_SQLstat where parse_calls_delta>=10000 and parse_calls_delta/decode(executions_delta,0,1,null,1,executions_delta)>10 and elapsed_time_delta/1000000>=500 order by 1;