暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

latch: cache buffers chains event tuning

张维照 2019-05-31
814

问题描述

对oracle数据库(single instance)进行了迁移升级从10.2.0.4 升到11.2.0.3,有一个项目迁完后第二天,cpu负载升到了130更高(16cpus). 向用户询问后使用上没有改变,平时就几个人使用,而该用户活动会话就有100多个。最在等待CBC latch. 怀疑是执行计划发生了改变。

[root@dbserver40 ~]# free
             total       used       free     shared    buffers     cached
Mem:      16429016   16363308      65708          0     188084    5901364
-/+ buffers/cache:   10273860    6155156
Swap:     16378228      18660   16359568
[root@dbserver40 ~]# top
top - 14:52:02 up 7 days, 21:08,  1 user,  load average: 141.69, 131.61, 127.43
Tasks: 990 total, 187 running, 803 sleeping,   0 stopped,   0 zombie
Cpu(s): 98.9%us,  0.6%sy,  0.0%ni,  0.4%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  16429016k total, 16360432k used,    68584k free,   188140k buffers
Swap: 16378228k total,    18660k used, 16359568k free,  5901532k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
20533 oracle    19   0 7416m  59m  56m R 32.0  0.4 179:24.70 oracle             
11667 oracle    19   0 7416m  57m  54m R 31.6  0.4  43:20.90 oracle             
 2376 oracle    19   0 7402m  42m  39m R 29.7  0.3  66:22.29 oracle             
 3165 oracle    19   0 7416m  59m  55m R 29.7  0.4  56:28.18 oracle 
...

[root@dbserver40 ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
177  0  18660  65432 188156 5901540    0    0    16    20    0    4 18  0 82  0  0
179  0  18660  65236 188172 5901524    0    0     0   268 1302 1723 100  0  0  0  0
88  0  18660  61580 188172 5901576    0    0     0    44 1924 2357 99  1  0  0  0
89  0  18660  61332 188172 5901576    0    0     0    64 3122 3488 98  1  1  0  0
92  0  18660  62464 188172 5901584    0    0     0    24 2165 2566 99  1  1  0  0
181  0  18660  53432 188200 5901596    0    0     0    36 1082 1434 100  0  0  0  0
182  0  18660  73676 188200 5901596    0    0     0   112 3667 4159 99  1  1  0  0
183  0  18660 152648 188208 5901588    0    0     0   188 2908 3600 98  1  1  0  0

AWR 部分信息

Top 5 Timed Foreground Events

                            Event Waits Time(s)  Avg wait (ms) % DB time Wait Class 
latch: cache buffers chains    26,429    102,652      3884   21.09      Concurrency 
DB CPU                         57,500   11.81   
buffer busy waits              7,910    1,631       206      0.34       Concurrency 
log file sync                  14,725   160         11       0.03       Commit 
library cache: mutex X         83       65          779      0.01       Concurrency 

Host CPU (CPUs: 16 Cores: 8 Sockets: 2) 

Load Average Begin Load Average End %User %System %WIO %Idle 
103.12 110.21 99.5 0.3 0.0 0.2 

Instance CPU 

%Total CPU %Busy CPU %DB time waiting for CPU (Resource Manager) 
99.5 99.7 0.0


专家解答

开始排查latch: cache buffers chains,通常是hot block issue

cms@PORA40>select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

system@PORA40>show parameter optimizer

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_capture_sql_plan_baselines boolean     FALSE
optimizer_dynamic_sampling           integer     2
optimizer_features_enable            string      11.2.0.3optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
optimizer_mode                       string      ALL_ROWS
optimizer_secure_view_merging        boolean     TRUE
optimizer_use_invisible_indexes      boolean     FALSE
optimizer_use_pending_statistics     boolean     FALSE
optimizer_use_sql_plan_baselines     boolean     TRUE

sys@PORA40>select addr,gets,misses,sleeps 
    from v$latch_children 
where name='cache buffers chains' 
    and misses > 100000 order by 3 desc;
	
ADDR                             GETS               MISSES               SLEEPS
---------------- -------------------- -------------------- --------------------
0000000214573E08            125350308              8576425                48949
00000002166A4B88             51526750              2257318                  239
00000002142FBCC0             30946314              2121193                  125
0000000214362AD8             31811886              2098570                 4171
0000000218F10E58             30812696              2065458                  134
...

sys@PORA40>select file#,dbablk,state,class from x$bh where hladdr='0000000214573E08';

               FILE#               DBABLK                STATE                CLASS
-------------------- -------------------- -------------------- --------------------
                  18                  529                    1                    4
                  18                  529                    3                    4
                  18                  529                    3                    4
                  18                  529                    3                    4
                  18                  529                    3                    4
                  18                  529                    3                    4
                  18                  529                    3                    4
                  18                  529                    3                    4
                   7                 7375                    1                    1
                   4               211350                    1                    1
                   9                31539                    1                    1
                   1               100892                    1                    1
                   2               112974                    1                    1

13 rows selected.

Tip:
x$bh.STATE NUMBER
KCBBHFREE 0 buffer free
KCBBHEXLCUR 1 buffer current (and if DFS locked X)
KCBBHSHRCUR 2 buffer current (and if DFS locked S)
KCBBHCR 3 buffer consistant read
KCBBHREADING 4 Being read
KCBBHMRECOVERY 5 media recovery (current & special)
KCBBHIRECOVERY 6 Instance recovery (somewhat special)

x$bh.CLASS NUMBER See Note 33434.1
1,’data block’,
2,’sort block’,
3,’save undo block’,
4,’segment header’,
5,’save undo header’,
6,’free list’,
7,’extent map’,
8,’1st level bmb’,
9,’2nd level bmb’,
10,’3rd level bmb’,
11,’bitmap block’,
12,’bitmap index block’,
13,’file header block’,
14,’unused’,
15,’system undo header’,
16,’system undo block’,
17,’undo header’,
18,’undo block’

sys@PORA40>select owner,segment_name 
  2  from dba_extents 
where file_id=&p1 
and &p2 between 
  5  block_id and block_id + blocks -1;
Enter value for p1: 18
old   3: where file_id=&p1
new   3: where file_id=18
Enter value for p2: 529
old   4: and &p2 between
new   4: and 529 between

OWNER                          SEGMENT_NAME
------------------------------ --------------------------------
CMS                            CMS_ENTITY

确认了在对象CMS.CMS_ENTITY上的一致读非常高。看TOP SQL中也与些对象相关,每次执行估算是60000 s。查看该SQL当前环境的执行计划。

cms@PORA40>explain plan for SELECT *
  2    FROM (  SELECT /*+FIRST_ROWS*/
  3                  e.entity_desc, e.entity_url, n.news_content
  4              FROM cms_entity e, cms_news n
  5             WHERE     n.entity_id = e.entity_id
  6                   AND e.ENTITY_DESC LIKE '%%'
  7                   AND e.ENTITY_CATEGORY LIKE '%310003617%'
  8          ORDER BY e.entity_id DESC)
  9   WHERE ROWNUM <= 100;
Explained.

cms@PORA40>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 3482088815
----------------------------------------------------------------------------------------------
| Id  | Operation                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             |   100 |   254K|    47M  (1)|157:40:34 |
|*  1 |  COUNT STOPKEY                 |             |       |       |            |          |
|   2 |   VIEW                         |             |   195 |   496K|    47M  (1)|157:40:34 |
|   3 |    NESTED LOOPS                |             |   195 | 42900 |    47M  (1)|157:40:34 |
|   4 |     TABLE ACCESS BY INDEX ROWID| CMS_NEWS    | 69599 |  6388K| 49866   (1)| 00:09:59 |
|   5 |      INDEX FULL SCAN DESCENDING| PK_CMS_NEWS | 69599 |       |   166   (1)| 00:00:02 |
|*  6 |     TABLE ACCESS FULL          | CMS_ENTITY  |     1 |   126 |   679   (1)| 00:00:09 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=100)
   6 - filter("E"."ENTITY_DESC" LIKE '%%' AND "E"."ENTITY_CATEGORY" LIKE
              '%310003617%' AND "E"."ENTITY_DESC" IS NOT NULL AND "E"."ENTITY_CATEGORY" IS NOT NULL
              AND "N"."ENTITY_ID"="E"."ENTITY_ID")

把CBO optimizer 的参数改回升级前版本

cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
Plan hash value: 3142591826
-----------------------------------------------------------------------------------------------
| Id  | Operation                       | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |   100 |   254K|   826   (1)| 00:00:10 |
|*  1 |  COUNT STOPKEY                  |             |       |       |            |          |
|   2 |   VIEW                          |             |   195 |   496K|   826   (1)| 00:00:10 |
|*  3 |    SORT ORDER BY STOPKEY        |             |   195 | 42900 |   826   (1)| 00:00:10 |
|   4 |     NESTED LOOPS                |             |   195 | 42900 |   825   (1)| 00:00:10 |
|*  5 |      TABLE ACCESS FULL          | CMS_ENTITY  |   188 | 23688 |   680   (1)| 00:00:09 |
|   6 |      TABLE ACCESS BY INDEX ROWID| CMS_NEWS    |     1 |    94 |     1   (0)| 00:00:01 |
|*  7 |       INDEX UNIQUE SCAN         | PK_CMS_NEWS |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=100)
   3 - filter(ROWNUM<=100)
   5 - filter("E"."ENTITY_DESC" LIKE '%%' AND "E"."ENTITY_CATEGORY" LIKE '%310003617%')
   7 - access("N"."ENTITY_ID"="E"."ENTITY_ID")

果然执行计划发生了改变。在NL join 中增加了SORT ORDER BY STOPKEY ,并发现两次关连CMS_ENTITY表都是FTS(full table scan),些表有8W+记录,居然没任何索引。

cms@PORA40>select * from user_ind_columns where table_name='CMS_ENTITY';
none any index
cms@PORA40>alter table CMS_ENTITY add constraint pk_CMS_ENTITY primary key (entity_id);
Table altered.

建完索引后看两个版本的参数的执行计划

alter session set optimizer_features_enable='10.2.0.4';
cms@PORA40>explain plan for ...
Explained.

cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 819915907

------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |               |   100 |   254K| 24889   (1)| 00:04:59 |
|*  1 |  COUNT STOPKEY                 |               |       |       |            |          |
|   2 |   VIEW                         |               |   195 |   496K| 24889   (1)| 00:04:59 |
|   3 |    NESTED LOOPS                |               |   195 | 42900 | 24889   (1)| 00:04:59 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| CMS_ENTITY    |   188 | 23688 | 24743   (1)| 00:04:57 |
|   5 |      INDEX FULL SCAN DESCENDING| PK_CMS_ENTITY | 88523 |       |   212   (1)| 00:00:03 |
|   6 |     TABLE ACCESS BY INDEX ROWID| CMS_NEWS      |     1 |    94 |     1   (0)| 00:00:01 |
|*  7 |      INDEX UNIQUE SCAN         | PK_CMS_NEWS   |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------


cms@PORA40>alter session set optimizer_features_enable='11.2.0.3';
Session altered.
cms@PORA40>explain plan for ...
cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 1963794189
-------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |               |   100 |   254K| 24889   (1)| 00:04:59 |
|*  1 |  COUNT STOPKEY                  |               |       |       |            |          |
|   2 |   VIEW                          |               |   195 |   496K| 24889   (1)| 00:04:59 |
|   3 |    NESTED LOOPS                 |               |       |       |            |          |
|   4 |     NESTED LOOPS                |               |   195 | 42900 | 24889   (1)| 00:04:59 |
|*  5 |      TABLE ACCESS BY INDEX ROWID| CMS_ENTITY    |   188 | 23688 | 24743   (1)| 00:04:57 |
|   6 |       INDEX FULL SCAN DESCENDING| PK_CMS_ENTITY | 88523 |       |   212   (1)| 00:00:03 |
|*  7 |      INDEX UNIQUE SCAN          | PK_CMS_NEWS   |     1 |       |     0   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID | CMS_NEWS      |     1 |    94 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

显然又都有了新改变。最主要的是11g现在变的可以接受了。10G 的变的没以前好了。因为这台机器上有一堆的小项目,调optimizer_features_enable是不想要的。那能不能让11g也用上10g刚才的执行计划呢? 去掉hint 试试

cms@PORA40>SELECT *
  FROM (  SELECT 
                e.entity_desc, e.entity_url, n.news_content
            FROM cms_entity e, cms_news n
           WHERE     n.entity_id = e.entity_id
                 AND e.ENTITY_DESC LIKE '%%'
                 AND e.ENTITY_CATEGORY LIKE '%310003617%'
        ORDER BY e.entity_id DESC)
 WHERE ROWNUM <= 100
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2559076494

-----------------------------------------------------------------------------------------------
| Id  | Operation                       | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |   100 |   254K|   826   (1)| 00:00:10 |
|*  1 |  COUNT STOPKEY                  |             |       |       |            |          |
|   2 |   VIEW                          |             |   195 |   496K|   826   (1)| 00:00:10 |
|*  3 |    SORT ORDER BY STOPKEY        |             |   195 | 42900 |   826   (1)| 00:00:10 |
|   4 |     NESTED LOOPS                |             |       |       |            |          |
|   5 |      NESTED LOOPS               |             |   195 | 42900 |   825   (1)| 00:00:10 |
|*  6 |       TABLE ACCESS FULL         | CMS_ENTITY  |   188 | 23688 |   680   (1)| 00:00:09 |
|*  7 |       INDEX UNIQUE SCAN         | PK_CMS_NEWS |     1 |       |     0   (0)| 00:00:01 |
|   8 |      TABLE ACCESS BY INDEX ROWID| CMS_NEWS    |     1 |    94 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

可以看到这个执行计划相对刚才的更好些,随然sql profile,outlines, baselines可以固定执行计划,可惜的是这些sql全是字面常量每次也都是hard parse, 也不希望改cursor_sharing ,看到负载降到了2,最主要的是应用程序早就N年没人接了,所以程序不能动,也就先调到上一步。如果有个参数可以忽略指定的hint就好了。有个隐藏参数_optimizer_ignore_hints 在解析时会忽略所有hint(不含递归SQL),可以在instance,session,sql 级指定。

cms@PORA40>alter session set "_optimizer_ignore_hints"=true;
Session altered.

cms@PORA40>explain plan for SELECT *
  2     FROM (  SELECT /*+FIRST_ROWS*/
  3                   e.entity_desc, e.entity_url, n.news_content
  4               FROM cms_entity e, cms_news n
  5              WHERE     n.entity_id = e.entity_id
  6                    AND e.ENTITY_DESC LIKE '%%'
  7                    AND e.ENTITY_CATEGORY LIKE '%310003617%'
  8           ORDER BY e.entity_id DESC)
  9    WHERE ROWNUM <= 100;

Explained.

cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2559076494

-----------------------------------------------------------------------------------------------
| Id  | Operation                       | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |   100 |   254K|   826   (1)| 00:00:10 |
|*  1 |  COUNT STOPKEY                  |             |       |       |            |          |
|   2 |   VIEW                          |             |   195 |   496K|   826   (1)| 00:00:10 |
|*  3 |    SORT ORDER BY STOPKEY        |             |   195 | 42900 |   826   (1)| 00:00:10 |
|   4 |     NESTED LOOPS                |             |       |       |            |          |
|   5 |      NESTED LOOPS               |             |   195 | 42900 |   825   (1)| 00:00:10 |
|*  6 |       TABLE ACCESS FULL         | CMS_ENTITY  |   188 | 23688 |   680   (1)| 00:00:09 |
|*  7 |       INDEX UNIQUE SCAN         | PK_CMS_NEWS |     1 |       |     0   (0)| 00:00:01 |
|   8 |      TABLE ACCESS BY INDEX ROWID| CMS_NEWS    |     1 |    94 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Summary:

当出现了CBC latch,通常是因为热块引起,可以参考hot block tunning的文章,检查对象上是否产生了无用的一致读。
升级前有必要先测试
程序代码中不是迫不得已不要使用hint
_optimizer_ignore_hints 隐藏参数可以忽略sql hint,但一定要注意测试,会有可能导致sql profile,baselines,outlines 失效,比如awr ,或系统view sql中的hint 被ignore, 降低查询。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论