优化器转换：视图合并二

原创赵勇 2022-10-07

572

这是有关视图合并的第二篇，有关视图合并的基础，简单select-project-join视图和出现在外连接中的视图是如何合并的，以及为什么其中一些视图不能被合并。请看视图合并一.在本篇中，我们将涵盖复杂视图合并。

复杂视图合并

我们使用术语“复杂视图合并”来描述对group by和distinct视图的合并。就像简单视图合并，其允许优化器提供额外的连接次序和访问方法。此外，对group-by/distinct操作的评估，可以延后到连接之后才评估。延后group-by的评估会让性能更好或更差，这取决于数据的特征。如果连接可以过滤，延后group-by 到连接后，会导致需执行group-by操作的数据集的减少；另一方面，先group-by 可以减少后续连接处理的数据量，或者连接会使参与group-by的数据突增。对于DISTINCT操作也是一样的。由于合并这样的视图并不总是更好的，我们是否选择使用这一转换，使用基于成本的方式。两个选择–用或不用视图合并–分别由优化器计算成本，只有当这样做的成本更低时，我们才会选择合并视图。

考虑以下的group by 视图和使用了它的查询：

create view cust_prod_totals_v as
select sum(s.quantity_sold) total, s.cust_id, s.prod_id
from sales s
group by s.cust_id, s.prod_id;

select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p, cust_prod_totals_v
where c.country_id = 'US'
and c.cust_id = cust_prod_totals_v.cust_id
and cust_prod_totals_v.total > 100
and cust_prod_totals_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';

该查询查找来自US的，购买了至少100个特定商品的所有客户。其视图是满足进行复杂视图合并的条件的。合并后，该查询看起来是这样的：

select c.cust_id, cust_first_name, cust_last_name, cust_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater'
group by s.cust_id, s.prod_id, p.rowid, c.rowid,
c.cust_email, c.cust_last_name, c.cust_first_name, c.cust_id
having sum(s.quantity_sold) > 100;

转换后的查询成本低于未转换查询，因此，优化器选择合并该视图。为什么转换后的查询成本更低呢？在未转换的查询中，group by 操作施加到视图中SALES表的全部。在转换后的查询中，到products和customers(尤其是products表）的连接，过滤掉了SALES表中大部分的数据，所以group by操作的成本更低。而连接的成本更高了，这是由于SALES表没有被减少，但成本并没有高太多，因为初始查询中的group by 并不能减少太多的数据。如果这些数据特征发生了变化，合并视图也许成本就不低了。因此是需要基于成本来确定。如下是最终的执行计划：

--------------------------------------------------------
| Id  | Operation             | Name      | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  2101  (18)|
|*  1 |  FILTER               |           |            |
|   2 |   HASH GROUP BY       |           |  2101  (18)|
|*  3 |    HASH JOIN          |           |  2099  (18)|
|*  4 |     HASH JOIN         |           |  1801  (19)|
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |    96   (5)|
|   6 |      TABLE ACCESS FULL| SALES     |  1620  (15)|
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |   296  (11)|
--------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("QUANTITY_SOLD")>100)
3 - access("C"."CUST_ID"="CUST_ID")
4 - access("PROD_ID"="P"."PROD_ID")
5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
7 - filter("C"."COUNTRY_ID"='US')

在上面的计划中并没有视图，这是视图合并后所期望的表现。然而，某些情况下，视图合并后仍然在执行计划中出现视图，其视图名类似like VW_NWVW_1。稍后，我们会讨论其原因，但首先让我们看一个例子。这也是给我们一个机会观察distinct视图合并的例子。考虑一下这个查找位于US，并购买了特定产品的用户的查询：

select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p,
(select distinct s.cust_id, s.prod_id
from sales s) cust_prod_v
where c.country_id = 'US'
and c.cust_id = cust_prod_v.cust_id
and cust_prod_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';

该视图可以被合并，不过要基于成本，由于DISTINCT会让数据更少，从而可以让连接的成本更低。然而，在本例中，合并视图是成本更低的，因此，我们得到了这样一个等价的查询：

select nwvw.cust_id, nwvw.cust_first_name, nwvw.cust_last_name, nwvw.cust_email
from (select distinct c.rowid, p.rowid, s.prod_id, s.cust_id, c.cust_id,
c.cust_first_name, c.cust_last_name, c.cujst_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater') nwvw;

其执行计划如下：

-------------------------------------------
| Id  | Operation             | Name      |
-------------------------------------------
|   0 | SELECT STATEMENT      |           |
|   1 |  VIEW                 | VM_NWVW_1 |
|   2 |   HASH UNIQUE         |           |
|*  3 |    HASH JOIN          |           |
|*  4 |     HASH JOIN         |           |
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |
|   6 |      TABLE ACCESS FULL| SALES     |
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |
-------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  3 - access("C"."CUST_ID"="S"."CUST_ID")
  4 - access("S"."PROD_ID"="P"."PROD_ID")
  5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
  7 - filter("C"."COUNTRY_ID"='US')

那么，为什么我们在合并了视图后，还有视图呢？新的视图我们称之为“投影视图”（projection view）。当我们合并视图时，将DISTINCT移到了外层查询块。但是，当我们移动DISTINCT时，为了维持语义上与原始查询等价，我们必须添加若干列。所以，我们将它们装到一个新视图中，以便我们可以找出只在外层查询块的select列表中的那些列。但我们仍然获得了从视图合并中得到的好处–所有的表位于同一个查询块，在最终的连接顺序上，优化器可以按其期望的随意排列，distinct操作被延后到了所有连接完成之后。这些投影视图出现在distinct视图被合并的查询中，或者一个group by视图被合并到含有group by，having,并（或）有聚合的外层查询块中。在后者的情况中，投影视图包含有外层查询块中的group by，having和聚合。
至此，最大的谜团已经被解开，让我们看一下group by 或 distinct视图不能被合并的原因。除了成本，有这样一些原因，包括：

外层查询中的表无rowid或唯一键列
视图出现在层次查询块中
视图包含grouping sets, rollup, pivot
视图或外层查询块包含有分页子句

总结

视图合并通过允许额外的连接次序，访问方法以及可以被采用的其它转换改善了执行计划。在视图合并会导致一个更好的执行计划的情况下，Oracle自动合并视图；在其它情况下，其取决于成本。一个视图不能被合并有多种原因，包括成本或有效性的限制。请注意，基于成本或启发式转换而被拒绝的视图合并，可以被提示所覆盖；但基于有效性而被拒绝的视图合并则不可以。

原文链接：https://blogs.oracle.com/optimizer/post/optimizer-transformations-view-merging-part-2
Optimizer Transformations: View Merging part 2
January 1, 2020 | 5 minute read
Maria Colgan
Distinguished Product Manager
This is the second of two posts on view merging. See the
first post for the basics of view merging, how it works for simple select-project-join views and views appearing in outer joins, and why one of these views might not be merged. In this post we’ll cover complex view merging.

Complex View Merging

We use the term “complex view merging” to describe merging of group by and distinct views. Like simple view merging, this allows the optimizer to consider additional join orders and access paths. In addition, the evaluation of the group-by/distinct operation can be delayed until after the joins have been evaluated. Delayed evaluation of group-by can make performance better or worse depending on the characteristics of the data. Delaying a group-by until after joins can result in a reduction in the data set on which the group-by operation is to be performed, if joins are filtering; on the other hand, early group-by can reduce the amount of data to be processed by subsequent joins or the joins could explode the amount of data to undergo group-by. The same is true for distinct operations. Because it is not always better to merge such a view, we choose whether to use this transformation in a cost-based manner. The two options - with and without view merging - are each costed by the optimizer, and we choose to merge the view only if it is cheaper to do so.

Consider the following group by view and query which refers to it:

create view cust_prod_totals_v as
select sum(s.quantity_sold) total, s.cust_id, s.prod_id
from sales s
group by s.cust_id, s.prod_id;
select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p, cust_prod_totals_v
where c.country_id = 'US'
and c.cust_id = cust_prod_totals_v.cust_id
and cust_prod_totals_v.total > 100
and cust_prod_totals_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';

This query finds all of the customers from the US who have bought at least 100 of a particular item. The view is eligible for complex view merging. After merging, the query looks like this:

select c.cust_id, cust_first_name, cust_last_name, cust_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater'
group by s.cust_id, s.prod_id, p.rowid, c.rowid,
c.cust_email, c.cust_last_name, c.cust_first_name, c.cust_id
having sum(s.quantity_sold) > 100;

The transformed query is cheaper than the untransformed query, so the optimizer chooses to merge the view. Why is the transformed query cheaper? In the untransformed query, the group by operator applies to the entire sales table in the view. In the transformed query, the joins to products and customers (especially products) filter out a large portion of the rows from the sales table, so the group by operation is much cheaper. The join is more expensive because the sales table has not been reduced, but it is not that much more expensive, since the group-by operation does not reduce the data size that much in the original query. If any of these characteristics were to change, it may no longer be cheaper to merge the view. Hence the need for a cost-based decision. The final plan is as follows:

--------------------------------------------------------
| Id  | Operation             | Name      | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  2101  (18)|
|*  1 |  FILTER               |           |            |
|   2 |   HASH GROUP BY       |           |  2101  (18)|
|*  3 |    HASH JOIN          |           |  2099  (18)|
|*  4 |     HASH JOIN         |           |  1801  (19)|
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |    96   (5)|
|   6 |      TABLE ACCESS FULL| SALES     |  1620  (15)|
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |   296  (11)|
--------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("QUANTITY_SOLD")>100)
3 - access("C"."CUST_ID"="CUST_ID")
4 - access("PROD_ID"="P"."PROD_ID")
5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
7 - filter("C"."COUNTRY_ID"='US')

There is no view in the plan above, which is what one would expect after the view has been merged. However, there are some cases where a view will still appear in the plan even after view merging, with a name like VW_NWVW_1. We’ll discuss the reasons why in a moment, but first let’s look at an example. This also gives us a chance to look at an example of distinct view merging. Consider this query to find customers in the US that bought a particular product:

select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p,
(select distinct s.cust_id, s.prod_id
from sales s) cust_prod_v
where c.country_id = 'US'
and c.cust_id = cust_prod_v.cust_id
and cust_prod_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';

The view can be merged, though it is based on cost, since the reduction in data due to distinct may make the join cheaper. In this case, however, it is cheaper to merge the view, so we get this equivalent query:

select nwvw.cust_id, nwvw.cust_first_name, nwvw.cust_last_name, nwvw.cust_email
from (select distinct c.rowid, p.rowid, s.prod_id, s.cust_id, c.cust_id,
c.cust_first_name, c.cust_last_name, c.cujst_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater') nwvw;

and this plan:

-------------------------------------------
| Id  | Operation             | Name      |
-------------------------------------------
|   0 | SELECT STATEMENT      |           |
|   1 |  VIEW                 | VM_NWVW_1 |
|   2 |   HASH UNIQUE         |           |
|*  3 |    HASH JOIN          |           |
|*  4 |     HASH JOIN         |           |
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |
|   6 |      TABLE ACCESS FULL| SALES     |
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |
-------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  3 - access("C"."CUST_ID"="S"."CUST_ID")
  4 - access("S"."PROD_ID"="P"."PROD_ID")
  5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
  7 - filter("C"."COUNTRY_ID"='US')

So why do we still have a view after we’ve supposedly merged the view? The new view is what we call a “projection view”. When we merge the view, we move the distinct to the outer query block. But when we move the distinct, we have to add several additional columns, in order to maintain semantic equivalence with the original query. So we put all of that into a new view, so we can select out just the columns we want in the outer query block’s select list. But we still get all of the benefits we promised from merging the view – all of the tables are in one query block and the optimizer is free to permute them as it desires in the final join order, and the distinct operation has been delayed until after all of the joins are completed. These projection views appear in queries where a distinct view has been merged, or a group by view is merged into an outer query block which also contains group by, having, and/or aggregates. In the latter case, the projection view contains the group by, having, and aggregates from the original outer query block.

Now that this great mystery has been revealed, let’s look at some of the reasons a group by or distinct view might not be merged. Aside from cost, there are several other reasons, including:

The outer query tables do not have a rowid or unique column
View appears in a connect by query block
View contains grouping sets, rollup, pivot
View or outer query block contains spreadsheet clause

Summary

View merging can improve plans by allowing additional join orders, access methods, and other transformations to be considered. In cases where view merging should always lead to a better plan, Oracle automatically merges a view; in other cases, this is determined based on cost. There are many reasons why a view may not be merged, including cost or validity restrictions. Note that view merging that is rejected on the basis of cost or heuristics can be overridden with hints; but view merging that is rejected based on validity may not.