Index Explosion 4 （索引激增4）

原创 Jonathan Lewis 2020-04-25

1154

在描述了如何最有效地处理（仅在Oracle技术支持人员认可后），因不必要的索引ITL条目浪费空间的问题之后，我给你留下了一个“可支持”的解决问题选项的简短列表。在这个贴子中，我将概述每种选择的优缺点。清单如下：

必要时使用coalesce-可能会经常定期做
将索引创建为反向键索引
将索引创建为全局哈希分区索引
去掉（或改变）索引

Coalesce（合并）

“alter index XXX coalesce”会沿着索引的顺序扫描索引叶子块，以尝试使用initrans和pctfree的当前设置，将数据从两个或更多的相邻叶子块中压缩到更少的块上，从而得到每个块中的可用空间。但这不会合并不同分支块下的叶子块，因此，这不会减少索引的高度。因合并操作而被清空的块会从索引结构中移除并放到freelist上。
coalesce命令作为一系列独立的小事务运行（因此，理论上可能会增加Oracle错误ORA-01555：快照太旧的风险–尽管在实践中这是不太可能的）。coalesce命令还会跳过包含未提交事务的块。
从积极的方面看，coalesce命令可能不需要做太多工作，而且它是一个在线命令。从消极的一面来说，它确实需要读取整个索引，如果并发问题影响了整个索引中分散的块，那么它可以生成大量的UNDO和REDO（在这种情况下，您可能决定不对浪费的空间做任何事情，或者可能每几周或几个月只处理一次）。

coalesce命令的另一种替代方法是重建索引，当您第一次意识到某个索引遇到这种并发问题时，您可能无论如何都希望这样做。但是，除非使用online选项，否则rebuild命令将在重建期间锁定表，在这种情况下，重建必须执行表扫描和排序（以及维护、然后应用更改日志，并在处理的开始和结束时短暂锁定表）。有一些方法可以最小化重建中使用的资源，但通常情况下，您可能不希望为这个问题而安排定期重建索引。

2010年4月27日更新：Timur Akhmadeev写了一篇很好的文章（点此链接），讨论了使用coalesce解决ITL爆炸问题的局限性。

Reverse Key（反向键）

在反向键索引中，索引中的每一列的字节序都是反向的，因此原来相邻（倾向于）的键值会随机地分散在索引中。作为一个副作用，您可能会在索引中更随机地进行插入，这意味着每个块可能看到更少的并发操作，而ITL问题“意外”地作为副作用消失。

但是，这里有反向键索引的几个不良副作用：
由于键值的插入点是完全随机的，所以，以前表现为基于时间的聚簇，突然消失了，甚至可能是对优化器而言不可接受的。除非你使用DBMS_STATS包，通过提供一个好的聚簇因子，来调整索引的统计信息。

如果您以前在索引的“顶部”（高值）端插入数据，然后再使用这些数据，那么您可以在索引的这一部分获得非常好的缓冲效果。通过使用反向键重新创建索引，您现在正在整个索引中随机插入键，因此您可能需要在数据库缓存中留出更多空间，以使最近使用的索引块保持缓冲。反转索引可能会导致磁盘I/O增加，即使数据使用和执行计划没有改变。

最后，优化器无法将基于范围的谓词应用于反向键索引。（注意–-这与对索引使用索引范围扫描不同：如果您有一个已反转的多列索引，优化器仍然可以对前导列上的相等谓词（如“COL1=constant”）执行范围扫描）。因此，您的一些执行计划可能会发生巨大的变化。

Global Hash Partitioned （全局HASH分区）

这通常看起来是简便的处理方案。创建索引为全局HASH分区索引，在索引的首列上分区。分区的数量应该考虑到必须处理的并发程度，并且在极端情况下很容易在16到128之间。（由于Oracle用于散列分区的策略(点此链接)，这个数字应该总是2的幂）。
这样做是因为在索引中，为看起来类似的数据项引入了更多的插入点。当索引的一个块被（比如）20个并发插入击中时，这些插入可能分布在N个分区上的不同块上。
当然，也有缺点。首先，分区选项只适用于企业版，它是一个许可的附加选项，而且价格不便宜。在技术上，要注意它有类似我们曾在介绍反向键索引问题时的一个问题。HASH分区索引对于“COLX=constant”这样的谓词很好——优化器可以选择一个HASH分区，运行时也可以很有效率；但是如果开始在HASH分区键列上使用基于范围的谓词，那么查询在运行查询时将必须遍历每个索引分区，如果创建的索引具有很多分区，对于轻量级（少量行）查询，则额外的索引探测开销可能会很可观。

Get rid of the index（去掉索引）

当你看到问题的时候，这可能不是第一件让你想到的事情，但这是一种可以考虑的可能性。索引是“无意义的键”索引（可能需要保留和反转）和“时间戳”索引是最有可能遇到ITL激增问题，我看到过后一种索引只是浪费空间的情况。当检查问题索引时，一定要记住Cary Milsapp说过的：“最快的方法是什么都不做”。有时修复问题索引的最佳方法是删除它。

最后:
在本系列的前一篇中，我没有回答的另一个问题是：“为什么我没有更频繁地注意到空间浪费现象，尽管我对可能发生这种现象的索引保持了关注。” 这个问题有好几个答案，比如：

我接到电话是为了解决更重要的问题，而一个比它本应占用的空间，大了两三倍的索引的副作用，与其他性能问题相比根本排不上号。
dba可能一直在进行定期（通常是无指向的）索引重建，因此当我在现场时，这个问题是不可见。
（这是我非常喜欢的一下）我已经看到了这个问题，但是我把这些症状归因于ITL问题的根本原因——一个尾部被缓慢删除的基于时间的索引（也被称为FIFO索引，或者用Tom Kyte的术语来说，是“sweeper”）。一个问题的解决方案也解决了另一个问题，所以我可能有过一石二鸟而不自知的时候。如果您想要一个这样的例子，请再看一看这篇关于索引分析的文章，问问您自己，这篇文章描述的空间浪费有多少是由于从索引尾部删除造成的，有多少是由于用户插入数据时的ITL并发问题造成的。在我看之前，我就知道该索引将是一个灾难区域，所以我没有仔细观察，或者我可能既看到了ITL问题，又看到了FIFO的问题。

索引与（堆）表有本质上的不同，并且由于不同，它们可能会发生一些有趣的事情。有时，在理解如何最好地利用它们以及如何避免它们可能带来的性能问题之前，您需要非常仔细地观察它们。

注：如果您没有通过本系列前面的文章来阅读本文，为了完整起见，我应该提到过，在Oracle的早期版本中，您可以通过设置maxtrans来避免ITL问题。只有在10g中，Oracle开始忽略maxtrans时，才需要解决这个问题。

原文链接和内容附后：
https://jonathanlewis.wordpress.com/2009/09/15/index-explosion-4/

Index Explosion – 4

Filed under: Index Explosion,Indexing,Infrastructure,Performance,Troubleshooting — Jonathan Lewis @ 7:30 pm BST Sep 15,2009
After describing how to deal most effectively – but only after approval from Oracle Support – with the problem of indexes wasting space on unnecessary ITL entries, I left you with a short list of “supportable” options for addressing the problem. In this note I’m going to outline a few pros and cons of each of those options. The list was as follows:

Use a coalesce when necessary – possibly on a very regular basis
Create the index as a reverse-key index
Create the index as global hash partitioned index
Get rid of (or change) the index

Coalesce

The command “alter index XXX coalesce” will make Oracle walk through the index leaf blocks in index order attempting to pack the data from two or more adjacent leaf blocks into a smaller number of leaf blocks using the current settings of initrans and pctfree to work out the space to use in each block. It won’t coalesce leaf blocks under different branch blocks, though, so can’t reduce the height of an index. The blocks that are emptied out by the coalesce are removed from the index structure and put on the freelist.

The coalesce command operates as a series of small isolated transactions (so could, in theory, increase the risk of Oracle error ORA-01555: snapshot too old – although in practise this is a little unlikely). The coalesce command also skips blocks holding uncommitted transactions.

On the plus side the coalesce command may not have to do much work, and it is an online command. On the minus side it does have to read the entire index and can generate quite a lot of undo and redo if the concurrency issue has affected blocks scattered all the way through the index (in this case, though, you may decide not to do anything about the waste space – or perhaps only fix it once every few weeks or months).

An alternative to the coalesce command is to rebuild the index – and you may want to do this anyway the first time you realise that you have an index suffering from this concurrency problem. But the rebuild command will lock the table for the duration of the rebuild unless you use the online option – in which case the rebuild has to do a tablescan and sort (as well as maintain, then apply, a journal of changes, and lock the table briefly at the start and end of processing). There are ways to minimise the resources used in a rebuild – but in general you probably don’t want to schedule a regular rebuild for this problem.

Update 27th April 2010: Timur Akhmadeev has written a nice article discussing the limitations of using the coalesce to address the problem of an exploding ITL.

Reverse Key

In a reverse key index each column in the index has its bytes reversed, so key values that were originally adjacent (tend to) become randomly scattered throughout the index. As a side effect, you are likely to spread the pattern of inserts more randomly through the index, which means each block probably sees less concurrent action, and the ITL issue “accidentally” disappears as a side effect.

There are several undesirable side effects to reverse key indexes, though.

You’ve effectively randomised the key insertion point, so an index which used to show a nice time-dependent clustering pattern suddenly loses that pattern and may appear undesirable to the optimizer unless you use calls to the dbms_stats package to adjust the index statistics by supplying a nice clustering_factor.

If you were previously inserting, and subsequently using, data at the “top” (high-values) end of the index you could have been getting a very good buffering effect on that part of the index. By recreating the index with reversed keys you are now inserting keys randomly throughout the index, so you may have to allow more space in your db cache to keep the recently used index blocks buffered. Reversing an index could cause an increase in disc I/O, even if the data usage and execution plans don’t change.

Finally, the optimizer cannot apply range-based predicates to reverse key indexes. (Note – this is not the same as using index range scans on the index: if you have a multi-column index that has been reversed the optimizer can still do a range scan for an equality predicate such as “COL1 = constant” on the leading columns). As a result of this, some of your execution plans might change dramatically.

Global Hash Partitioned

This often looks like an easy winner. Create the index as a globally hash partitioned index, partitioned on the first column of the index. The number of partitions should take account of the degree of concurrency you have to deal with, and could easily be somewhere between 16 and 128 in extreme cases. (The number should always be a power of two because of the strategy that Oracle uses for hash partitioning).

This works because you introduce far more insertion points in the index for data items that looks similar. Where a single block of the index was being hit by (say) 20 concurrent inserts, those inserts are likely to be spread over many different blocks scattered across the N partitions.

There are downsides, of course. For a start the partitioning option is only available for Enterprise Edition, and it’s a licensed extra option, and it’s not cheap. On a technical note you’ve also introduced a problem similar to one of the problems with reverse key indexes. The hash-partitioned index is fine for predicates like “COLX = constant” – where the optimizer can pick a single hash partition and the run-time task can be efficient; but if you start using range-based predicates on the hash column then your query will have to work through every single index partition as it runs the query, and if you’ve created the index with a large number of partitions the overhead of the extra index probes may be significant if the query is supposed to be a light-weight (small number of rows) query.

Get rid of the index

This may not be the first thing that crosses your mind when you see the problem – but it is a possibility to be considered. The indexes that are most likely to run into the ITL explosion are “meaningless key” indexes (which you may need to keep and reverse) and the “timestamp” index, and I have seen cases where the latter type of index is simply a waste of space. When examining problem indexes always remember Cary Milsapp’s comment: “the fastest way to do something is to not do it at all”. Sometimes the best way to fix a troublesome index is to drop it.

And finally:
The other question I left unanswered in the previous note in this series was: “Why haven’t I noticed the space-wasting phenomenon more often despite seeing indexes where it could have been happening.” There are several answers to this question, for example:

I’ve been called in to address more significant problems, and the side effects of an index that was two or three times larger than it should have been simply didn’t show up in comparison to the other performance issue.
The DBAs may have been doing regular (and usually pointless) index rebuilds so the issue is never visible when I’m on site
(This is one I really like). I’ve seen the problem but attributed the symptoms to something that is often the underlying cause of the ITL issue – the time-based index with a slowly deleted tail (also known as the FIFO index or, in Tom Kyte’s terminology, the “sweeper”). The solution for one problem also fixes the other – so there have probably been times when I’ve killed two birds with one stone without realising it. If you want an example of this, take another look at this article on index analysis and ask yourself how much of the space wastage the article describes was due to deleting from the tail of the index, and how much of it was due to the ITL concurrency problem as users were inserting the data. I knew that the index was going to be a disaster area before I looked at it, so I didn’t look at it closely enough or I would probably have seen the ITL problem as well as the FIFO problem.

Indexes are fundamentally different from (heap) tables, and there are several interesting things that can happen to them because of the difference. Sometimes you need to look at them very closely before you understand how to make best use of them and how to avoid the performance problems that they can introduce.

Footnote: Just in case you haven’t come to this article by way of the earlier articles in the series I should mention for completeness that in earlier versions of Oracle you could avoid the ITL problem by setting maxtrans. The need for workarounds to the problem only become necessary in 10g where Oracle started to ignore maxtrans.

index itl oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者