oracle 索引集群因子的学习

原创小张同学 2023-07-14

302

集群因子 cluster_factor:用于判断索引回表所需要消耗的物理IO次数。

 select owner,index_name,clustering_factor from dba_indexes where owner='ZWY' and index_name='IND_ID';

OWNER                                                        INDEX_NAME                                                   CLUSTERING_FACTOR
------------------------------------------------------------ ------------------------------------------------------------ -----------------
ZWY                                                          IND_ID                                                                    1361

select * from (select object_id,rowid from zwy_0714 where object_id is not null order by object_id) where rownum<=10;
 OBJECT_ID ROWID
---------- ------------------
         2 AAAWqYAAEAAAHEbAAw
         3 AAAWqYAAEAAAHEbAAF
         4 AAAWqYAAEAAAHEbAAx
         5 AAAWqYAAEAAAHEbAAa
         6 AAAWqYAAEAAAHEbAAV
         7 AAAWqYAAEAAAHEbAAR
         8 AAAWqYAAEAAAHEbAAk
         9 AAAWqYAAEAAAHEbAAN
        10 AAAWqYAAEAAAHEbAAl
        11 AAAWqYAAEAAAHEbAA4

对于数据中的每一行，rowid伪列返回行的地址。rowid值主要包含以下信息：

对象的数据对象编号
该行所在的数据文件中的数据块
该行中数据块的位置（第一行是0）
数据行所在的数据文件（第一个文件是1）。该文件编号是相对于表空间。

AAAWqY       AAE     AAAHEb       AAw
数据对象编号   数据块   数据块的位置   数据文件编号

索引的叶子块中有序地存储了索引键值和键值所在的rowid；
clustering_factor的算法:
比较索引键如object_id所在rowid是否在同一个数据块，如果在，则clsutering_factor不增加，反之则自增1，可以看到以上11行都在AAE的同一个数据块中，所以clustering_factor不增加，依次一直比较到最后索引键最后一个键值。

1.索引的集群因子介于表的数据块block以及表的行数之间
2.索引的集群因子越小，回表所需要访问的数据块越少，反之越多

假设根据索引扫描返回了1000行数据，根据rowid来进行回表访问
第一种情况：1000行数据都在同一个block中，对应集群因子小，那么回表消耗仅为1个IO，因为仅需读取一个block
第二种情况：1000行数据都各自分布在不同的block中，对应集群因子大，那么回表需要1000个IO，因为需要读取1000个block

索引扫描类型：
1.索引范围扫描（index range scan） – 受集群因子影响，因为需要回表
2.索引全扫描（index full scan） – 受集群因子影响，因为需要回表
3.索引唯一扫描 (index unique scan) – 不受集群因子影响，因为只需要回表1次
4.索引快扫 (index fast full scan) – 不受集群因子影响，因为不需要回表

解决方案：
1、重建索引（无效，没有改变索引键和表数据对应的顺序关系）
2、根据索引列排序来重建表 create table new_table as select * from old_table order by index_column_name（不现实，生产上大表往往不止一个索引，小表通常不需要建立索引）
3、减少回表次数，如创建组合索引避免或者减少回表

案例：

SQL>  select owner,index_name,clustering_factor from dba_indexes where owner='ZWY' and index_name='IND_ID';

OWNER                                                        INDEX_NAME                                                   CLUSTERING_FACTOR
------------------------------------------------------------ ------------------------------------------------------------ -----------------
ZWY                                                          IND_ID                                                                    1361


 SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) blocks from zwy.zwy_0714;

    BLOCKS
----------
      1249

可以看到以上ind_id索引的clustering_factor为1361与表的block数很接近

SQL> set arraysize 500
SQL> set arraysize 5000
SQL> set autot trace
SQL> select * from zwy_0714 where object_id<1000;

997 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1726413974

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   938 | 91924 |    19   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ZWY_0714 |   938 | 91924 |    19   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID   |   938 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<1000)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         21  consistent gets 
          0  physical reads
          0  redo size
      92748  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        997  rows processed

SQL> create table zwy_0714_2 as select * from zwy_0714 order by dbms_random.value;

Table created.

SQL> create index ind_id2 on zwy_0714_2(object_id);

Index created.

select owner,index_name,clustering_factor from dba_indexes where owner='ZWY' and index_name='IND_ID2';
SQL> select owner,index_name,clustering_factor from dba_indexes where owner='ZWY' and index_name='IND_ID2';

OWNER                                                        INDEX_NAME                                                   CLUSTERING_FACTOR
------------------------------------------------------------ ------------------------------------------------------------ -----------------
ZWY                                                          IND_ID2                                                                  87220
 select count(distinct dbms_rowid.rowid_block_number(rowid)) blocks from zwy.zwy_0714_2;
    BLOCKS
----------
      1249

 SQL> select count(*) from zwy_0714_2;

  COUNT(*)
----------
     87276

可以看到以上ind_id索引的clustering_factor为1361与表的行数很接近

select /*+index(zwy_0714_2) */ * from zwy_0714_2 where object_id<1000; --因为集群因子很大，cbo由此计算走索引的成本很高，所以需要指定hint
Execution Plan
----------------------------------------------------------
Plan hash value: 2953394467

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   997 |   201K|   961   (0)| 00:00:12 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ZWY_0714_2 |   997 |   201K|   961   (0)| 00:00:12 |
|*  2 |   INDEX RANGE SCAN          | IND_ID2    |   997 |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<1000)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1002  consistent gets
          0  physical reads
          0  redo size
      92748  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        997  rows processed

再看看索引快扫以及唯一扫描的情况下，集群因子有没有影响

SQL> select count(object_id) from zwy_0714_2;


Execution Plan
----------------------------------------------------------
Plan hash value: 1266430892

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |    13 |    58   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |         |     1 |    13 |            |          |
|   2 |   INDEX FAST FULL SCAN| IND_ID2 | 90870 |  1153K|    58   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        201  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

可以看到，只访问1次表或者不回表的情况下，集群因子对执行计划的选择无影响，执行计划选择走索引。

墨力计划

最后修改时间：2023-07-14 18:03:59

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

oracle 索引集群因子的学习

评论