1 优化场景介绍
要优化的SQL的各种变体在业务系统中经常可以看到,这里使用的是Oracle 官网的SH数据库,统计的是某个产品在某个渠道的销售次数,语句如下
SQL> select count(*) from sh.sales_no_par s inner join sh.channel_dup c on s.CHANNEL_ID=c.CHANNEL_ID where s.PROD_ID='27';
COUNT(*)
----------
12163
这种形式的语句通常要创建索引加以优化,但是在实际场景中,有不少情况下,或者没有创建索引或者创建的索引不能用,检查语句涉及的表上的索引使用下面的SQL语句
SQL> select INDEX_NAME from dba_indexes where table_name in ('SALES_NO_PAR','CHANNEL_DUP');
no rows selected
两个表上没有任何索引,这时语句的执行计划是
explain plan for select count(*) from sh.sales_no_par s inner join sh.channel_dup c on s.CHANNEL_ID=c.CHANNEL_ID where s.PROD_ID='27';--解释语句的执行计划
SQL> select * from table(dbms_xplan.display()); --显示刚才解释的执行计划,为了节省篇幅,后面的命令输出中将略去这两条命令
这时,语句的执行计划如下:
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1319805298
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 1231 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 12762 | 124K| 1231 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| CHANNEL_DUP | 5 | 15 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| SALES_NO_PAR | 12762 | 89334 | 1228 (1)| 00:00:01 |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
4 - filter("S"."PROD_ID"=27)
Oracle选择了hash join,小表CHANNEL_DUP是内表,在sh模式中,原表channel中CHANNEL_ID是主键,我们在CHANNEL_ID上创建一个唯一索引在看看语句的执行计划
create unique index sh.idx_cd_id on sh.CHANNEL_DUP(CHANNEL_ID);
这时的执行计划是
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 1228 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | NESTED LOOPS | | 12762 | 124K| 1228 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| SALES_NO_PAR | 12762 | 89334 | 1228 (1)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN| IDX_CD_ID | 1 | 3 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("S"."PROD_ID"=27)
4 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
操作2变成了嵌套连接,成本看起来略有较低,这里内表是SALES_NO_PAR,Oracle优化这么选择的原因是CHANNEL_DUP上执行的是唯一索引扫描,并且无需回表,成本极低,几乎是0。这条语句的执行计划中,成本最高的SALES_NO_PAR的全表扫描操作,优化的主要方向应该也是这个全表扫描操作。对于全表扫描操作,如果不能消除,则Oracle 12C以上版本的in-memory特性是很好的选择。
2 使用in-memory优化表的全表扫描
2.1 激活in-memory特性
默认in-memory是关闭的,开启需要重启数据库,执行以下步骤
SQL> alter system set inmemory_size=400m scope=spfile;
SQL> shutdow immediate;
SQL> startup;
ORACLE instance started.
Total System Global Area 1603287928 bytes
Fixed Size 4922232 bytes
Variable Size 352321536 bytes
Database Buffers 822083584 bytes
Redo Buffers 4530176 bytes
In-Memory Area 419430400 bytes
Database mounted.
Database opened.
只要inmemory_size设置为大于0的值,inmemory_size特性就会被激活,激活之后可以增加inmemory_size的值,调整会立即生效,不用重启,但是在线不能减小,如减小必须调整spfile后重启生效。特性激活后需要设置表使用这个特性
2.2 设置表使用inmemory特性
SQL> alter table sh.sales_no_par inmemory; --激活表的inmemory特性
SQL> EXEC DBMS_INMEMORY.POPULATE('SH', 'SALES_NO_PAR');--装载表至INMEMORY缓冲区
2.3 执行计划及分析
Plan hash value: 3895422689
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 56 (18)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | NESTED LOOPS | | 12163 | 118K| 56 (18)| 00:00:01 |
|* 3 | TABLE ACCESS INMEMORY FULL| SALES_NO_PAR | 12163 | 85141 | 56 (18)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | IDX_CD_ID | 1 | 3 | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - inmemory("S"."PROD_ID"=27)
filter("S"."PROD_ID"=27)
4 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
使用INMEMORY特性后,全表扫描的成本降至56,不到原来的20分之一,在执行计划基本不变的情况下,语句的执行效率提高了20多倍。这个特性对hash 连接依然有效,
SQL> drop index sh.IDX_CD_ID;
--语句的执行计划
Plan hash value: 1319805298
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 59 (17)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 12163 | 118K| 59 (17)| 00:00:01 |
| 3 | TABLE ACCESS FULL | CHANNEL_DUP | 5 | 15 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS INMEMORY FULL| SALES_NO_PAR | 12163 | 85141 | 56 (18)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
4 - inmemory("S"."PROD_ID"=27)
filter("S"."PROD_ID"=27)
2.4 inmemory的优化-排除特定列
有时查询只用到部分列,可以只将查询用到了列装载到inmemory缓冲区,将其余的用不到的列排除,使用下面的命令
alter table sh.SALES_NO_PAR inmemory (PROD_ID,CHANNEL_ID) no inmemory(TIME_ID,CUST_ID);
SQL> l
1* SELECT table_name, segment_column_id seg_col_id, column_name, inmemory_compression FROM v$im_column_level where table_name='SALES_NO_PAR'
SQL> /
TABLE_NAME SEG_COL_ID COLUMN_NAME INMEMORY_COMPRESSION
-------------------------------- ---------- -------------------------------- --------------------------
SALES_NO_PAR 1 PROD_ID DEFAULT
SALES_NO_PAR 2 CUST_ID NO INMEMORY
SALES_NO_PAR 3 TIME_ID NO INMEMORY
SALES_NO_PAR 4 CHANNEL_ID DEFAULT
SALES_NO_PAR 5 PROMO_ID DEFAULT
SALES_NO_PAR 6 QUANTITY_SOLD DEFAULT
SALES_NO_PAR 7 AMOUNT_SOLD DEFAULT
3 利用索引消除全表扫描
3.1 单列索引
对于SALES_NO_PAR的全表扫描操作,优化器评估其返回的行数是12163,而SALES_NO_PAR的行数是918843行,这种情况下创建索引,更改访问路径为索引范围扫描是DBA经常选择的优化方式,根据执行计划的操作4,语句的过滤条件应用在列PROD_ID上,我们在这个列上创建索引,再看执行计划
SQL> create index sh.idx_s_d_p on sh.SALES_NO_PAR(PROD_ID);
Plan hash value: 221229178
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 196 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 12762 | 124K| 196 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | CHANNEL_DUP | 5 | 15 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| SALES_NO_PAR | 12762 | 89334 | 193 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IDX_S_D_P | 12762 | | 28 (0)| 00:00:01 |
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
5 - access("S"."PROD_ID"=27)
Oracle优化器将SALES_NO_PAR上的全表扫描改为了索引范围扫描后回表,成本降为193。这个执行计划的成本主要是在第4步回表的操作,在从第2步的访问条件来看,第4步回表操作的主要目的是从表里取得CHANNEL_ID列,因此,进一步优化的方式就是在这两个列上创建联合索引,由于CHANNEL_ID是连接的列,将其放在左边执行计划如下
3.2 联合索引
SQL> create index sh.s_p_c on sh.SALES_NO_PAR(CHANNEL_ID,PROD_ID);
Plan hash value: 746340682
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 45 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 12762 | 124K| 45 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| CHANNEL_DUP | 5 | 15 | 3 (0)| 00:00:01 |
|* 4 | INDEX SKIP SCAN | S_P_C | 12762 | 89334 | 42 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
4 - access("S"."PROD_ID"=27)
filter("S"."PROD_ID"=27)
创建了联合索引后,消除了回表,语句的执行成本降至45,甚至低于使用inmemory特性时。如果注意到执行计划的操作4执行了INDEX SKIP SCAN,执行计划还可以进一步优化,删除先前的索引,创建下面的索引
3.3 联合索引的优化
SQL> create index sh.s_p_c on sh.SALES_NO_PAR(PROD_ID,CHANNEL_ID);
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 35 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 12163 | 118K| 35 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| CHANNEL_DUP | 5 | 15 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | S_P_C | 12163 | 85141 | 32 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID")
4 - access("S"."PROD_ID"=27)
索引跳跃扫描改为索引范围扫描,执行成本进一步降低,执行效率较最初相比提高了接近40倍。
4 注意事项
本文所列执行计划皆为explain解释生成,和语句实际执行时的执行计划有时会有差异,在实际优化时应该同时检查语句执行时的执行计划。




