dynamic_remastering_rac_orainternals

盖国强

315

9页

0次

2019-06-21

5墨值下载

OBJECT REMASTERING IN RAC

Introduction

In RAC, every data block is mastered by an instance. Mastering a block simply means that master

instance keeps track of the state of the block until the next reconfiguration event (due to instance

restart or otherwise).

Hash to the master

These data blocks are mastered in block ranges. For example, range of blocks starting from file 10,

block 1 through block 128 may be mastered by instance 1, blocks from file 10, block 129 through

256 are mastered by instance 2 etc. Of course, there are differences between various versions 10g,

11g etc, but Idea here is that block ranges are uniformly mastered between various instances so that

Global cache grants are evenly distributed among the instances. Interestingly, length of the block

range is 128 from 10g onwards (Julian Dyke mentioned that is 1089 in 9i, but I have not personally

tested it). Of course, Support recommends you to unset db_file_multiblock_read_count which will

be auto adjusted to 128 which means that Full block range can be read with fewer GC messages, I

suppose. I digress.

Further, Michael Möller (“M2”) pointed out that this hash-algorithm is further optimized: The hash-

algorithm used when initially computing the master node from the DBA, results in a "virtual master",

which is then translated to a real (online&open) master by a lookup table (the length of which is the

maximum number of possible nodes (128 ?). This means that when one node goes off/on-line, RAC

does NOT have to recalculate the hash for all blocks, but only distribute the new Hash-to-node

table. (One can later visualize dynamic remastering as an additional lookup table between the hash

value and node. This table also needs redistributing on node changes.)

Following SQL is helpful in showing masters and owners of the block. This SQL joins the fixed

tables, x$kjbl with x$le to retrieve resource name. If you are familiar with Oracle locking strategy, you

would probably recognize the format of these cache fusion (aka old PCM) locks. Lock type in this

case is BL, id1 is block# and id2 is file_id in this case. Column kjblname2 provides a decimal format

lock resource.

Please observe the output below:

1. Block range: File 1, block 6360-6376 is mastered by node 3 and also owned by node 3.

2. Block range: File 1, blocks upto 10709 is mastered by instance 0 and owned by instance 3.

3. Next block range from 10770 is mastered by instance 2 and owned by 3.

Note that this output is coming from a database with no remastering done yet.

REM In kjblname2 first entry before ',' is block and seond entry file_id*65536 for BL

locks

select kj.*, le.le_Addr from (

select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp,

substr ( kjblname2, instr(kjblname2,',')+1, instr(kjblname2,',',1,2)-instr(kjblname2,',',1,1)-1)/65536 fl,

substr ( kjblname2, 1, instr(kjblname2,',')-1) blk

from x$kjbl

) kj, x$le le

where le.le_kjbl = kj.kjbllockp

order by le.le_addr

KJBLNAME KJBLNAME2 KJBLOWNER KJBLMASTER FL BLK LE_ADDR

---------------------- -------------- ---------- ---------- --- ----- ----------------

[0x18d8][0x10000],[BL] 6360,65536,BL 3 3 1 6360 000000038DF9AB08

...

[0x18e7][0x10000],[BL] 6375,65536,BL 3 3 1 6375 00000038DFBF818 #2

[0x18e8][0x10000],[BL] 6376,65536,BL 3 3 1 6376 000000038DFD3BA0

...

[0x29d1][0x10000],[BL] 10705,65536,BL 3 0 1 10705 00000005D6FE9230

...

[0x29d5][0x10000],[BL] 10709,65536,BL 3 0 1 10709 000000038EFBB990

[0x2a12][0x10000],[BL] 10770,65536,BL 3 2 1 10770 000000075FFE3C18

...

[0x2a18][0x10000],[BL] 10776,65536,BL 2 2 1 10776 000000038EFA8488 #1

[0x2a19][0x10000],[BL] 10777,65536,BL 3 2 1 10777 000000038EFB7F90

[0x2a1a][0x10000],[BL] 10778,65536,BL 1 2 1 10778 000000038EFCC318

Let’s consider three simple cases of SELECT sql statement running instance 3:

1. A session is trying to access the block file 1, block 10776, but that block is mastered by

instance 2 and also that block is owned by instance 2 (meaning, it is in instance 2 cache). So,

instance 3 will send a PR (Protected Read) mode BL lock request on that block to instance 2.

Ignoring any additional complexities, instance 2 will grant PR mode lock to instance 3 and

transfer the block to instance 3. Obviously, this involves multiple GC messages, grants and

block transfer. Statistics ‘gc remote grants’ gets incremented too.

2. Let’s consider that session is trying to access another block: file 1, block 6375. That block is

mastered by instance 3 and also owned by instance 3. At this point, there is no additional

GCS/GES processing is needed and the session pin that buffer and continue the work.

3. Let’s consider a third case. Session is trying to access file 1 block 6374. That block is not in

any buffer cache, but instance 3 is master of the block, so local affinity locks are acquired

with minimal GC messages and waits. That block is read from the disk in to the buffer cache

In the case #2 and #3 above, requesting instance also is the master node of a block or block range.

In these cases, statistics ‘gc local grants’ is incremented and cheaper local affinity locks on those

block ranges are acquired avoiding many Global cache messages.

So far so good, but what if, say instance 1, is reading one object (table, index etc) aggressively, but

other instances are not reading that object at all? [through some sort of application node partitioning

or just plain workload]. Does it make sense for the instance accessing that object aggressively request

a grant to the remote instance(s) for each OPEN on that object’s blocks? Especially, if the blocks are

read in to the buffer cache, but disappears soon from the buffer cache? Wouldn’t that be better if the

instance reading that object aggressively is also the master of that object, as in the cases #2 and #3

above?

In addition to that, if the block is supposed to be thrown away from buffer cache (close of BL lock)

or if the block needs to be written, then that will involve additional overhead/messaging between the

master instance and owner instance since the ownership needs to be communicated back to the

master of the block.

Enter Object remastering.