本贴是四篇系列中的第四篇,涵盖索引碎片的内容。完整的系列如下所示:
1、介绍 – (含2-4的链接)
2、磁盘和表空间碎片
3、表碎片
4、索引碎片 - 本篇
四、索引“碎片”
在我前一篇表碎片中提到的多个extent和ASSM中的“碎片”的情况也同样适用于索引,当然,它们的表现方式是一样的–即,它很少会是个问题。
当人们提及“索引碎片化”时,他们通常考虑的是“产生稀疏块”的问题–这也是我在关于表碎片的贴子中描述的现象。尽管表和索引有一些差异,我们会稍后涉及这些差异。不过,更值得考虑的另一个关于"索引碎片化”的解释是–叶子块分裂的副作用产物。这会导致“逻辑上相邻”的叶子块,在物理上散得很开;接下来,我们将会讨论它。
我们从批量删除开始,并回顾与表中相同的代表性案例(即,删除的行来自于20%的块和被删除的20%的行来自于全部块)–当我们这样做时,我们必须谨记索引上的删除和表上的删除是不同的,这会使随后发生的行为不同。当一个事务从表上删除一行,它可以在提交发生之前,将行减少到只残存几个字节,并立即使用表块上因此而释放出来的空间;而当一个事务从索引上删除一行,它必须在原位保留整个索引条目并标记它已被删除–它不能立即重用这些空间,必须等到提交之后才行。(我在几个月之前,发表过一个演示这种不同的样例https://jonathanlewis.wordpress.com/2010/03/25/index-too-big/)
另一个表和索引上的重要不同是索引条目必须在正确的位置;所以,当索引块有一些可用空间,但又不是全空时,只有属于这个索引,并应该放入到这个块的行才能使用这些可用空间。此外,当叶子块变空时,即使它已经被链接到freelist中,但它仍然在索引中原来的位置上。(我猜测这也许会比较容易处理读一致性的问题,但是,它还是会涉及回滚和在索引结构中改变三个链的成本问题。)所以,我们会发现一个被删除了大量位于索引前部的记录的索引,一个查询“最小值“的操作,在它找到包含数据的叶子块之前,可能还是要进行扫描过大量空叶子块的范围扫描–这也是为什么我们要考虑,对定期从索引低端删除大量数据,或者,在更一般但又不太常见的案例,从索引上删除了大量的连续值的索引,做定期的空间合并(coalesce)。
在更常见的批量删除的案例中,我们会在每个叶子块上都留下大量空空间。不同于表上的可用空间,我们不能为PCTUSED选取一个适当的值,让Oracle重用这些空间。这是因为这个参数与索引无关。因此,对于这个特定问题,我们必须考虑:这些空空间对应用的影响有多大。
通常考虑的是“更多的备用空间”和“在缓冲区中保留更多的块”,但是,我们应该去想,产生的大量稀疏叶子块,对于应用性能是否有直接的重要影响,而答案取决于应用。当然,我们通常是用索引来整理键值并将它们打包到一个小空间中–考虑到这一点,我们可以注意到许多查询所做的大部分工作都是在从索引中收集了一些键值之后访问表的,因此,每个叶块有大量空空间的索引,其所引入的额外工作可能只是查询总工作的一小部分。并且, 也许我们因此决定不在重建索引上投入资源,除非,它已经变得非常稀疏。(典型的“随机访问”B-tree索引,每个叶子块上会使用了70%的空间,空闲30%。除非我有直接证据表明索引为一组关键查询贡献了大量时间,否则在索引利用率降至50%之前,我不会特别担心与索引相关的性能。)
这里有两个关于索引上“碎片化”的更深入的问题。但是,不适用于表。第一个问题是,你删除一个旧值并且插入一个新值,你不需要更新索引条目。只要变化是足够随机的,那么,当你批量删除时,这不会导致任何问题。但是,如果你观察一些基于时间排序的更新–例如,如果你在last_update列上有一个索引,然后你最终可能会遇到稀疏索引导致的最坏影响。在这类案例中,从对应索引低端的块中删除条目,以及向对应索引高端的块中插入条目,也许会变慢。并且,因删除产生的空空间并不会被重用。这是因为记录不会被更新到过去。此外,如果你持续将行从过去移动到将来,你就会持续地访问稀疏块。如果这是一个OLTP系统,在该系统中,用户一次更新一行或两行,那么对索引块的查找和更新可能成为每个update语句所做工作的主要部分。至少你需要了解这种行为模式,这样你才能决定如何衡量对性能的影响,并制定应对威胁的策略。
另一个更适用“碎片”这个词的索引碎片化的问题,其与叶子块分裂有关。如果你要向一个填满的叶子块上插入一个新条目时,Oracle必须在某处找到一个新的空块,从当前叶子块拷贝大约一半的数据到新块,并且将它链接到索引结构的正确位置。因此,索引中“逻辑上相邻”的块不一定在“物理上也相邻”,这意味着当你做一个大范围的索引范围扫(或索引全扫),可能最终做了大量的随机单块读。
这对于SQL Server(可能还有Sybase和DB2)就很重要。SQL Server处理堆表上的可用空间的方法不是太好(http://www.simple-talk.com/sql/learn-sql-server/oracle-to-sql-server,-crossing-the-great-divide,-part-3/),因此,在SQL Server中,所有表都应该使用“聚集索引”来构建几乎是一个信条(甚至是教条)–(按Oracle术语)所有的表都应创建为有效的索引组织表。如果你做过数据聚集,并且仔细的思考过,那么叶块的分裂会破坏你让相关数据条目在一起的努力。因此,具有SQL Server(以及Sybase和DB2)背景的dba一直热衷于重建索引的想法并不奇怪;如果要重新构建聚集索引,则需要将行(即表)数据放回到所需的位置(很方便,这不需要重建表上的所有其他索引,因为与Oracle IOT上的二级索引一样,SQL Server(等)中的其他索引将使用唯一(或“未验证”)键作为行标识符。)。
从Oracle的观点看,这类碎片化通常不用担心–假设你正在思考标准B-tree 索引–如上所述,大多数查询所做的大部分工作是访问表。但是,SQL Server范例为您提供了一条线索,告诉您何时应该更仔细地考虑“碎片化”的影响和索引重建的需求。如果作为一个Oracle DBA,您已经创建了一个表作为索引组织表(IOT),那么您可能需要有充分的理由做出这样的选择,并且这个理由可能是为了确保将不同于数据到达的顺序而存储的相关的数据项,能(物理地)存储在一起。
如果你为了让数据聚合,而创建了IOT(索引组织表),那么叶子块的分裂会导致数据有轻微的离散。在你担心这一点之前,仔细考虑一下该分散对你的重要程度,以及这么做会有什么好处(可能是微不足道)。出于演示的目的,假设一个典型查询会收集IOT上每行200个字节长的200行。做为堆表,这可能会让你读200个随机分散的独立的块,为此,你决定将表建成IOT。再假设叶子块的分裂是最差程度的(无回填的五五分裂),200行进入IOT表,每个叶子块存储20行,总共10个叶子块。由于叶块的分裂,可以安全地假设这10个块将在整个索引段中相当随机地分散。如果您重建索引,您将能够将数据打包成五个块,而这五个块(通常)将是段中的相邻块,而不是分散的块–这种相邻性可能意味着如果索引范围扫描必须转到磁盘,您将获得一点额外的性能优势。(注:SQL Serverr操作的extent是8个8K大小的块,并且数据库软件能够与操作系统合作,在这种情况下提前协商整个数据块的读取:这种细节的组合使在这种情况下,重建索引对SQL Server比对Oracle更有利。)
一旦您花费一点时间去思考类似这样的场景,就会更容易地理解如何回答我能从重建IOT表收益多少的问题。您必须进行的块访问总数的减少量是多少;这些块访问中有多少是单独的磁盘读取;以及您从Oracle、操作系统、以及能够实现某种形式的预读以减少获取这些块所需的时间的硬件驱动程序获得多少好处。(顺便说一句,请牢记我在磁盘碎片的贴子中的一个重要说明–就Oracle数据文件而言,两个块似乎是相邻的块,因为尝试在多个层上引入条带和负载均衡,可能会导致两个块位于不同的磁盘上。)
扩展阅读:
很难创建算法来处理各种情况下的可用空间管理,特别是当您试图支持复杂代码来同时处理读一致性时。因此,在某些情况下,Oracle代码路径中会出现意外且昂贵的副作用。请参阅这篇文章(https://jonathanlewis.wordpress.com/2009/11/10/index-freelists/)中的一个这样的案例(或两个,如果你遵循文章中的链接)
我写过几篇关于叶块分割问题的贴子(https://jonathanlewis.wordpress.com/2009/09/19/index-itls/),这些叶块分割问题曾经偶尔出现在有大量并发进程的,使用序列生成的合成键值的系统中环境中。在Oracle的早期版本中,这个问题可以很容易地绕过,但是在10g中,这类问题似乎出现得更频繁了(可能是有更多繁忙的系统了,也可能是Oracle的代码路径的其它效率的原因),而且过去的办法不再管用了。该问题导致基于序列的索引以大约25%的空间效率运行,并且在Oracle恢复索引的maxtrans参数之前,您无法对其进行任何未得到Oracle支持部门批准的操作。
原文链接和内容如下:
https://jonathanlewis.wordpress.com/2010/07/22/fragmentation-4/
Fragmentation 4
Filed under: fragmentation,Index Rebuilds,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 7:00 pm BST Jul 22,2010
This note is part four of a four-part series, and covers Index fragmentation. The whole series is as follows
Introduction – with links to parts 2 – 4
Disk and Tablespace Fragmentation
Table Fragmentation
Index Fragmentation – this bit
4. Index “fragmentation”.
The multiple extent and ASSM “fragmentation” that I described in the previous article about table fragmentation applies equally well to indexes, of course, and matters in just the same way – i.e. hardly ever.
When people speak of “index fragmentation” they are usually thinking of the problem of “sparsely populated blocks” – which is also a phenomenon I described in the notes about table fragmentation although there are differences between tables and indexes that we shall consider shortly. There is another interpretation of “index fragmentation”, though, which can be worth considering – the side effects of leaf block splitting which results in “logically adjacent” leaf blocks being widely physically scattered; and we shall also be considering that in the following notes.
We’ll start with bulk deletion and review the same representative cases that we considered with tables (i.e. all the rows deleted from 20% of the block and 20% of the rows deleted from all the blocks) – and when we do this we have to remember that there is a difference between index deletions and table deletions that makes the subsequent behaviour different. When a transaction deletes a row from a table it can reduce the row to a stub of just a few bytes before the commit takes place and immediately make use of the space this reduction frees up in the table block; when a transaction deletes a row from an index it has to leave the entire index entry in place and flag it as deleted – it cannot immediately reuse the space, it has to wait until after the commit. (I posted an example demonstrating this difference a few months ago.)
Another critical difference between tables and indexes is that an index entry has to go in the right place; so when an index block has some free space but is not completely empty the only rows that can go into that block are rows that belong in exactly that part of the index. Moreover, when the leaf block becomes empty it stays linked into the same position in the index even though it has also been linked into the freelist. (My guess is that this probably makes it easier to deal with problems of read-consistency, but it may also relate to problems of rolling back and the cost of changing the three links in the index structure.) So when we look at an index which has had a very large delete at the leading edge, a query for “lowest value” may have to do a range scan through a very large number of empty leaf-blocks before it finds the leaf block that contains any current data – which is why we should always consider the option for a regular coalesce on an index where we regularly delete a lot of data from the low end or, in the more generic though less common case, where we delete a large number of consecutive values anywhere in the index.
In the more general case of a bulk delete we could leave quite a lot empty space in every leaf block which, unlike table free space, we cannot make Oracle reuse by picking a cunning value for pctused because that storage parameter isn’t relevant to indexes. So the particular question we have to consider for indexes is this: how much impact might this empty space have on the application.
The usual considerations of “more space to back up” and “more blocks to keep in the buffer” apply, of course, but we ought to ask if there could be a more significant direct impact on performance from a large number of sparsely populated leaf blocks. The answer is dependent on the application, of course, but we typically use indexes to collate key values and pack them into a small space – and with this thought in mind we can note that the bulk of the work done by many queries comes from visiting the table after collecting a number of key values from an index; consequently the extra work introduced by indexes with a significant amount of empty space per leaf block may be a small fraction of the total work of a query, and we may therefore decide that we don’t want to invest resources in rebuilding an index unless it has become very sparsely populated. (A typical “random arrival” B-tree index will run at an average of about 70% utilisation, 30% free space, per leaf block; I don’t get particularly worried about index related performance until an index gets down to about 50% utilisation unless I have direct evidence that the index is contributing a significant amount of time to a critical set of queries.)
There are two further issues of “fragmentation” in indexes, though, that do not apply to tables. The first is that you do not update index entries – you delete an entry for the old value and insert an entry for the new value. If the changes are effectively random then this shouldn’t cause any of the problems that you associate with bulk deletions; but if you can see some sort of “time-based” pattern to the update – for example if you have an index on a last_updated column then you could end up suffering the worst possible effects of a sparsely populated index. In this type of case you would be (slowly) deleting entries from blocks towards the low end of the index and inserting them in the block at the high end of the index; and the empty space produced by the deletion would not be subject to reuse because rows could not be updated back into the past; moreover if you keep moving rows from the past to the future you keep visiting leaf blocks which are sparsely populated, and if this is an OLTP system where users update one or two rows at a time the seek and update to the index blocks can become a significant fraction of the work done on each update statement. At the very least you need to be aware of this pattern of activity so that you can decide how to measure the impact on performance and create a strategy for dealing with the threat.
The other fragmentation issue with indexes is one where the word “fragmentation” really feels more appropriate. It relates to leaf block splits. If you want to insert an entry into an index leaf block that’s full Oracle has to find a new empty block somewhere, copy roughly half the data from the current leaf block into it and link the block into the existing index structure in the right place. As a consequence, blocks which are “logically adjacent” in the index are not necessarily “physically adjacent”, which means that when you do a large index range scan (or index full scan) you could actually end up doing lots of single block random reads.
This is where SQL Server (and probably Sybase and possibly DB2) become significant. The way that SQL Server deals with free space management for heap tables is not very good, so it is almost an article of faith (dogma even) in SQL Server that all tables should be built using “clustered indexes” – which (in Oracle terms) means that all tables are effectively index organized tables (IOTs). If you’ve tried to cluster your data, and thought about it carefully and deliberately, then leaf block splits are destroying your efforts to keep related data items together. So it’s not really surprising that DBAs with a background in SQL Server (and Sybase and DB2) are so keen on the idea of rebuilding indexes all the time; if you’re rebuilding a clustered index you’re putting the row (i.e. table) data back where you wanted it. (Conveniently this doesn’t require all the other indexes on the table to rebuilt because, like secondary indexes on Oracle’s IOTs, the other indexes in SQL Server (et. al.) will be using the unique (or “uniquified”) key as the row identifier. )
From an Oracle perspective this type of fragmentation generally doesn’t matter – provided you’re thinking about standard B-tree indexes – because, as noted above, most queries will do most of their work visiting the table. But the SQL Server paradigm gives you a clue about when you ought to consider the effects of “fragmentation” and the requirement for index rebuilds more carefully. If, as an Oracle DBA, you have created a table as an index organized table (IOT) then you probably had a good reason for making that choice, and that reason was probably to ensure that data arriving in one order was stored in a different order so that related data items were stored together.
If you’ve created an IOT to keep the data together, then leaf block splits will cause to the data to become a little scattered. Before you worry too much about this, think carefully about the importance you are going to attach to this scattering, and the (possibly marginal) benefit of doing something about it. For purposes of illustration, imagine that a typical query against an important IOT collects 200 rows of 200 bytes. As a heap table this might have required you to read 200 randomly scattered separate blocks, so you decided to implement the table as an IOT. Assuming the worst level of leaf block splits (50/50 with no back-fill appearing) then the 200 rows will go into the IOT at about 20 rows per block for a total of 10 leaf blocks. Due to the timing of leaf block splits it is safe to assume that these 10 blocks will end up scattered fairly randomly throughout the index segment. If you rebuild the index you will be able to pack the data into just five blocks, and those five blocks will (usually) be adjacent blocks in the segment rather than being scattered – and this adjacency probably means you get a little extra performance benefit if the index range scan has to go to disc. (Note – SQL Server operates with extent sizes of 8 blocks of 8KB and the database software is able to co-operate with the operating system to negotiate a readahead of an entire extent in situations like this: this combination of details makes index rebuilds in these circumstance rather more beneficial to SQL Server than it would to Oracle.)
Once you’ve spent a little time thinking about scenarios like this it is much easier to understand how to address the question of how much benefit you might get from rebuilding an IOT. What is the reduction in the total number of block visits you will have to make; how many of those block visits would have been separate disc reads; and how much benefit might you get from Oracle or the operating system, or the hardware drivers, being able to implement some form of readahead to reduce the time it takes to fetch those blocks. (Bear in mind, by the way, an important comment I made in the article on disk fragmentation – just because two blocks appear to be adjacent blocks as far as the Oracle data file is concerned it’s possible that attempts to introduce striping and load balancing at multiple layers will result in the two blocks being on different discs).
Further Reading:
It is difficult to create algorithms to deal with free space management in all combinations of circumstances, especially when you are trying to support complex code for handling read-consistency at the same time. Consequently there are cases where unexpected, and expensive, side effects appear in the Oracle code path. See this posting for one such case (or two if you follow the link in the article).
I have written several notes about a problem with leaf block splits that used to appear occasionally in systems that used sequence-generated synthetic keys in an environment with a large number of concurrent processes. This problem could be bypassed very easily in earlier versions of Oracle, but in 10g it seems to appear more frequently (perhaps because of the increase in the number of very busy systems, possibly because of other efficiencies in Oracle’s code path) and the historic correction to the problem is no longer viable. The problem results in sequence-based indexes running at roughly 25% space efficiency – and until Oracle reinstates the maxtrans parameter for indexes, there isn’t much you can do about it that will be approved by Oracle support.




