© Orainternals, Riyaj Shamsudeen - 1 - Blog entry
In RAC, every data block is mastered by an instance. Mastering a block simply means that master
instance keeps track of the state of the block until the next reconfiguration event (due to instance
restart or otherwise).
Hash to the master
These data blocks are mastered in block ranges. For example, range of blocks starting from file 10,
block 1 through block 128 may be mastered by instance 1, blocks from file 10, block 129 through
256 are mastered by instance 2 etc. Of course, there are differences between various versions 10g,
11g etc, but Idea here is that block ranges are uniformly mastered between various instances so that
Global cache grants are evenly distributed among the instances. Interestingly, length of the block
range is 128 from 10g onwards (Julian Dyke mentioned that is 1089 in 9i, but I have not personally
tested it). Of course, Support recommends you to unset db_file_multiblock_read_count which will
be auto adjusted to 128 which means that Full block range can be read with fewer GC messages, I
suppose. I digress.
Further, Michael Möller (“M2”) pointed out that this hash-algorithm is further optimized: The hash-
algorithm used when initially computing the master node from the DBA, results in a "virtual master",
which is then translated to a real (online&open) master by a lookup table (the length of which is the
maximum number of possible nodes (128 ?). This means that when one node goes off/on-line, RAC
does NOT have to recalculate the hash for all blocks, but only distribute the new Hash-to-node
table. (One can later visualize dynamic remastering as an additional lookup table between the hash
value and node. This table also needs redistributing on node changes.)
Following SQL is helpful in showing masters and owners of the block. This SQL joins the fixed
tables, x$kjbl with x$le to retrieve resource name. If you are familiar with Oracle locking strategy, you
would probably recognize the format of these cache fusion (aka old PCM) locks. Lock type in this
case is BL, id1 is block# and id2 is file_id in this case. Column kjblname2 provides a decimal format
lock resource.
Please observe the output below:
1. Block range: File 1, block 6360-6376 is mastered by node 3 and also owned by node 3.
2. Block range: File 1, blocks upto 10709 is mastered by instance 0 and owned by instance 3.
3. Next block range from 10770 is mastered by instance 2 and owned by 3.
Note that this output is coming from a database with no remastering done yet.
REM In kjblname2 first entry before ',' is block and seond entry file_id*65536 for BL
select kj.*, le.le_Addr from (
select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp,
substr ( kjblname2, instr(kjblname2,',')+1, instr(kjblname2,',',1,2)-instr(kjblname2,',',1,1)-1)/65536 fl,
substr ( kjblname2, 1, instr(kjblname2,',')-1) blk
from x$kjbl
© Orainternals, Riyaj Shamsudeen - 2 - Blog entry
) kj, x$le le
where le.le_kjbl = kj.kjbllockp
order by le.le_addr
---------------------- -------------- ---------- ---------- --- ----- ----------------
[0x18d8][0x10000],[BL] 6360,65536,BL 3 3 1 6360 000000038DF9AB08
[0x18e7][0x10000],[BL] 6375,65536,BL 3 3 1 6375 00000038DFBF818 #2
[0x18e8][0x10000],[BL] 6376,65536,BL 3 3 1 6376 000000038DFD3BA0
[0x29d1][0x10000],[BL] 10705,65536,BL 3 0 1 10705 00000005D6FE9230
[0x29d5][0x10000],[BL] 10709,65536,BL 3 0 1 10709 000000038EFBB990
[0x2a12][0x10000],[BL] 10770,65536,BL 3 2 1 10770 000000075FFE3C18
[0x2a18][0x10000],[BL] 10776,65536,BL 2 2 1 10776 000000038EFA8488 #1
[0x2a19][0x10000],[BL] 10777,65536,BL 3 2 1 10777 000000038EFB7F90
[0x2a1a][0x10000],[BL] 10778,65536,BL 1 2 1 10778 000000038EFCC318
Let’s consider three simple cases of SELECT sql statement running instance 3:
1. A session is trying to access the block file 1, block 10776, but that block is mastered by
instance 2 and also that block is owned by instance 2 (meaning, it is in instance 2 cache). So,
instance 3 will send a PR (Protected Read) mode BL lock request on that block to instance 2.
Ignoring any additional complexities, instance 2 will grant PR mode lock to instance 3 and
transfer the block to instance 3. Obviously, this involves multiple GC messages, grants and
block transfer. Statistics ‘gc remote grants’ gets incremented too.
2. Let’s consider that session is trying to access another block: file 1, block 6375. That block is
mastered by instance 3 and also owned by instance 3. At this point, there is no additional
GCS/GES processing is needed and the session pin that buffer and continue the work.
3. Let’s consider a third case. Session is trying to access file 1 block 6374. That block is not in
any buffer cache, but instance 3 is master of the block, so local affinity locks are acquired
with minimal GC messages and waits. That block is read from the disk in to the buffer cache
In the case #2 and #3 above, requesting instance also is the master node of a block or block range.
In these cases, statistics ‘gc local grants’ is incremented and cheaper local affinity locks on those
block ranges are acquired avoiding many Global cache messages.
So far so good, but what if, say instance 1, is reading one object (table, index etc) aggressively, but
other instances are not reading that object at all? [through some sort of application node partitioning
or just plain workload]. Does it make sense for the instance accessing that object aggressively request
a grant to the remote instance(s) for each OPEN on that object’s blocks? Especially, if the blocks are
read in to the buffer cache, but disappears soon from the buffer cache? Wouldn’t that be better if the
instance reading that object aggressively is also the master of that object, as in the cases #2 and #3
In addition to that, if the block is supposed to be thrown away from buffer cache (close of BL lock)
or if the block needs to be written, then that will involve additional overhead/messaging between the
master instance and owner instance since the ownership needs to be communicated back to the
master of the block.
Enter Object remastering.
of 9


Top250 周榜 月榜