Big Update

2011-01-01

475

JL Computer Consultancy

Handling massive updates ( ? A Y2K trick on Oracle 7 ?)

May 1999

Given the type of work I do I never expected to have anything to do with a Year 2000 project. However I have to sit somewhere when I visit a client and recently I found myself sitting opposite someone who was busy writing code in preparation for updating a large Oracle system that had used a character string of the form YYMMDD to hold a date.

Since I was there he described the problems he had been having and asked if I had any solution. This is what I came up with.

The Problem:

A table of ca. 250,000,000 rows with a varchar2() column that needs to be changed from YYMMDD to 19YYMMDD. There is space in the table (PCTFREE) to allow this change to take place without causing row migration.

There is not enough space in the database to allow rollback segments to grow large enough for a single update with a single commit to correct all the rows.

There is not enough space in the database to do a 'create as select unrecoverable'.

Updates which use a loop to update some rows and commit are too slow for various reasons, such as reduced concurrency, increased I/O, single row calls etc.

One Solution:

To get maximum concurrency, minimum I/O, and minimum contention between parallel processes I emulated the Oracle Parallel Query strategy with statements like:

update

        /*+ rowid */

        tableX

        set char_date = '19' || char_date

        where   nvl(length(char_date),0) = 6

        and     rowid between  chartorowid('000927.0000.0008')

                         and   chartorowid('000A3F.FFFF.0008');

The/*+ rowid */ hint makes Oracle use a special addressing mode that ensures only a block range including the required start and stop rowids is visited (without the hint, and prior to 7.2-ish the query would have resulted in a full tablescan).

The big trick then is to produce a set of rowid ranges that completely cover the table without any overlaps. Once this is done, any number of concurrent processes can be run, each with a separate rowid range; I/O will be kept to a minimum; there will be no (base table) contention between processes; finally, with sufficient rollback segments declared there will be minimum rollback contention. The only outstanding problem will be the redo log contention.

The Strategy:

There are two key elements to the strategy - the first is generating a set of rowid ranges, the second is a method for using a list of controlling values to run a scalable set of concurrent processes. The rest of this note deals with the first problem, the second problem will be the subject of a separate article.

To generate a list of rowid ranges (and remember that this is for Oracle 7 only), you start with the list of DBA_EXTENTS for an object, and derive a pair of hexadecimal rowid ranges for each extent.

A row from DBA_EXTENTS includes the columns: FILE_ID and BLOCK_ID, which tell you the starting block for the extent, and BLOCKS which tells you the length of the extent.

A rowid looks like: block_id.row_index.file_id, e.g. 00001F23.0003.0006, so to get from DBA_EXTENTS to a rowid range you need to calculate the block_id for the last block in the extent (which will be BLOCK_ID + BLOCKS - 1), convert the file id and block id to a hexadecimal form, and then insert a 'row number 0' for the first block, and a 'row number 0x7FFF' for the trailing row in the last block. (Oracle is not bothered by missing rows in the range). As an added refinement, you could choose to break very large extents into a series of consecutive rowid ranges.

A PL/SQL function to convert decimal to hex is described in a separate note.

Once you can produce a list of rowid ranges, the next step is to write them into a table so that a another process (or multiple concurrent copies of another process) can acquire a control row from this table, and use the content to update the base table. The detail of this 'control table' strategy that will be given in a later note

Back to Main Index of Topics .

The package:

rem

rem     Script:        y2k_tools.sql

rem     Author:        Jonathan Lewis

rem     Dated:         Feb-1999

rem     Purpose:       Generate list of rowid ranges for an extent

rem

rem     Notes:

rem     This is STRICTLY version 7

rem

rem     The owner of the procedure has to have access to view

rem     DBA_EXTENTS as a directly granted privilege.

rem

rem     The code converts object names to capitals, so will fail

rem     for objects which have managed to get their names into

rem     lower case.

rem

rem     There is no type checking for objects;  you should ensure

rem     that you supply a table name that is NOT a clustered table.

rem

create table rowid_control (

        owner_name     varchar2(32),

        segment_name   varchar2(32),

        rowid_start    rowid,

        rowid_end      rowid,

        status         char(1)

);

create or replace package y2k_tools as

        procedure generate_rowid_ranges (

               i_owner_name   in      varchar2,

               i_segment_name in      varchar2,

               i_chunk_size   in      number  default 160

);

end;

create or replace package body y2k_tools as

--

--      Private function to turn a start block and block count

--      into starting and ending rowids for that range

--

procedure generate_range (

        i_file_id      in     number,

        i_block_id     in     number,

        i_blocks       in     number,

        o_start           out rowid,

        o_end             out rowid

) is

begin

        o_start :=  chartorowid(

                lpad(jpl_utils.decimal_to_hex(i_block_id),8,'0') ||

                '.0000.' ||

               lpad(jpl_utils.decimal_to_hex(i_file_id),4,'0')

);

        o_end := chartorowid(

                lpad(jpl_utils.decimal_to_hex(

                                i_block_id + i_blocks - 1

                     ),8,'0'

                ) ||

                '.7FFF.' ||

               lpad(jpl_utils.decimal_to_hex(i_file_id),4,'0')

);

end generate_range;

procedure generate_rowid_ranges (

        i_owner_name   in      varchar2,

        i_segment_name in      varchar2,

        i_chunk_size   in      number  default 160

) is

        cursor c1 (v_owner_name varchar2, v_segment_name varchar2) is

        select

               block_id, file_id, blocks

        from

               dba_extents

        where

               segment_name = upper(v_segment_name)

        and     owner = upper(v_owner_name)

        m_tmp   c1%rowtype;

        m_rowid_start  rowid;

        m_rowid_end    rowid;

begin

        for r1 in c1(i_owner_name, i_segment_name) loop

               m_tmp := r1;

               while (m_tmp.blocks >= i_chunk_size ) loop

                       generate_range(

                               i_file_id  => m_tmp.file_id,

                               i_block_id => m_tmp.block_id,

                               i_blocks   => i_chunk_size,

                               o_start    => m_rowid_start,

                               o_end      => m_rowid_end

);

                       dbms_output.put_line(

                               m_rowid_start  || ' - ' || m_rowid_end

);

                       insert into rowid_control(

                               owner_name,

                               segment_name,

                               rowid_start,

                               rowid_end,

                               status

                       values (

                               i_owner_name,

                               i_segment_name,

                               m_rowid_start,

                               m_rowid_end,

'N'

);

                       m_tmp.blocks := m_tmp.blocks - i_chunk_size;

                       m_tmp.block_id := m_tmp.block_id + i_chunk_size;

               end loop ;

               if (m_tmp.blocks != 0) then -- pick up the last bit

                       generate_range(

                               i_file_id  => m_tmp.file_id,

                               i_block_id => m_tmp.block_id,

                               i_blocks   => m_tmp.blocks,

                               o_start    => m_rowid_start,

                               o_end      => m_rowid_end

);

                       dbms_output.put_line(

                               m_rowid_start  || ' - ' || m_rowid_end

);

                       insert into rowid_control(

                               owner_name,

                               segment_name,

                               rowid_start,

                               rowid_end,

                               status

                       values (

                               i_owner_name,

                               i_segment_name,

                               m_rowid_start,

                               m_rowid_end,

'N'

);

               end if;

        end loop;

end generate_rowid_ranges;

end y2k_tools;

Example showing the use of the package:

set serveroutput on size 100000
execute y2k_tools.generate_rowid_ranges('jpl','big_junk',320);

0001D6BB.0000.0022 - 0001D7BE.7FFF.0022

0001D7BF.0000.0022 - 0001D8C2.7FFF.0022

0001D8C3.0000.0022 - 0001D9C6.7FFF.0022

0001D9C7.0000.0022 - 0001DACA.7FFF.0022

0001DACB.0000.0022 - 0001DBCE.7FFF.0022

0001DBCF.0000.0022 - 0001DCD2.7FFF.0022

0001DCD3.0000.0022 - 0001DDD6.7FFF.0022

0001DDD7.0000.0022 - 0001DEDA.7FFF.0022

0001DEDB.0000.0022 - 0001DFDE.7FFF.0022

0001DFDF.0000.0022 - 0001E0E2.7FFF.0022

0001E0E3.0000.0022 - 0001E1E6.7FFF.0022

0001E1E7.0000.0022 - 0001E2EA.7FFF.0022

0001E2EB.0000.0022 - 0001E3EE.7FFF.0022

0001E3EF.0000.0022 - 0001E4F2.7FFF.0022

0001E4F3.0000.0022 - 0001E5F6.7FFF.0022

At the same time, the rows above were also inserted into the rowid_control table. To demonstrate that the code has produced a range of rowids that covers the table completely without overlap, I then ran the script below to show that I could use the list of rowid ranges to count the rows in the table

Whilst I have only done a 'select count(*)' in the script below, you could equally well have written an update statement. In a further article I will demonstrate a general purpose table-based parallel allocation mechanism that you can use to control concurrent processes. This will use the rowid_control table to identify sections of the table to update..

rem

rem     get_test.sql

rem

select count(*) from jpl.big_junk;

set serveroutput on

declare

        cursor c1 is

        select rowid_start, rowid_end

        from rowid_control

        where owner_name = 'JPL'

        and segment_name = 'BIG_JUNK';

        m_temp  number := 0;

        m_count number := 0;

begin

        m_count := 0;

        for r1 in c1 loop

               select /*+ rowid */

               count(*) into m_temp

               from jpl.big_junk

               where rowid between r1.rowid_start and r1.rowid_end

               m_count := m_count + m_temp;

        end loop;

        dbms_output.put_line(m_count);

end;

oracle

最后修改时间：2020-04-16 14:52:30

Big Update

JL Computer Consultancy

Handling massive updates ( ? A Y2K trick on Oracle 7 ?)

May 1999

评论