JL Computer Consultancy
Handling massive updates ( ? A Y2K trick on
Oracle 7 ?)
|
May 1999
|
Given the type of work I do I never expected to have anything to do with a Year 2000 project. However I have to sit somewhere when I visit a client and recently I found myself sitting opposite someone who was busy writing code in preparation for updating a large Oracle system that had used a character string of the form YYMMDD to hold a date.
Since I was there he described the problems he had been having and asked if I had any solution. This is what I came up with.
The Problem:
A table of ca. 250,000,000 rows with a varchar2() column that needs to be changed from YYMMDD to 19YYMMDD. There is space in the table (PCTFREE) to allow this change to take place without causing row migration.
There is not enough space in the database to allow rollback segments to grow large enough for a single update with a single commit to correct all the rows.
There is not enough space in the database to do a 'create as select unrecoverable'.
Updates which use a loop to update some rows and commit are too slow for various reasons, such as reduced concurrency, increased I/O, single row calls etc.
One Solution:
To get maximum concurrency, minimum I/O, and minimum contention between parallel processes I emulated the Oracle Parallel Query strategy with statements like:
update /*+ rowid */
tableX
set char_date = '19' || char_date
where nvl(length(char_date),0) = 6
and rowid between chartorowid('000927.0000.0008')
and chartorowid('000A3F.FFFF.0008');
The/*+ rowid */ hint makes Oracle use a special addressing mode that ensures only a block range including the required start and stop rowids is visited (without the hint, and prior to 7.2-ish the query would have resulted in a full tablescan).
The big trick then is to produce a set of rowid ranges that completely cover the table without any overlaps. Once this is done, any number of concurrent processes can be run, each with a separate rowid range; I/O will be kept to a minimum; there will be no (base table) contention between processes; finally, with sufficient rollback segments declared there will be minimum rollback contention. The only outstanding problem will be the redo log contention.
The Strategy:
There are two key elements to the strategy - the first is generating a set of rowid ranges, the second is a method for using a list of controlling values to run a scalable set of concurrent processes. The rest of this note deals with the first problem, the second problem will be the subject of a separate article.
To generate a list of rowid ranges (and remember that this is for Oracle 7 only), you start with the list of DBA_EXTENTS for an object, and derive a pair of hexadecimal rowid ranges for each extent.
A row from DBA_EXTENTS includes the columns: FILE_ID and BLOCK_ID, which tell you the starting block for the extent, and BLOCKS which tells you the length of the extent.
A rowid looks like: block_id.row_index.file_id, e.g. 00001F23.0003.0006, so to get from DBA_EXTENTS to a rowid range you need to calculate the block_id for the last block in the extent (which will be BLOCK_ID + BLOCKS - 1), convert the file id and block id to a hexadecimal form, and then insert a 'row number 0' for the first block, and a 'row number 0x7FFF' for the trailing row in the last block. (Oracle is not bothered by missing rows in the range). As an added refinement, you could choose to break very large extents into a series of consecutive rowid ranges.
Once you can produce a list of rowid ranges, the next step is to write them into a table so that a another process (or multiple concurrent copies of another process) can acquire a control row from this table, and use the content to update the base table. The detail of this 'control table' strategy that will be given in a later note
Back to Main Index of Topics .
The package:
remrem Script: y2k_tools.sql
rem Author: Jonathan Lewis
rem Dated: Feb-1999
rem Purpose: Generate list of rowid ranges for an extent
rem
rem Notes:
rem This is STRICTLY version 7
remrem The owner of the procedure has to have access to view
rem DBA_EXTENTS as a directly granted privilege.
remrem The code converts object names to capitals, so will fail
rem for objects which have managed to get their names into
rem lower case.
remrem There is no type checking for objects; you should ensure
rem that you supply a table name that is NOT a clustered table.
remcreate table rowid_control (
owner_name varchar2(32),
segment_name varchar2(32),
rowid_start rowid,
rowid_end rowid,
status char(1)
);
create or replace package y2k_tools asprocedure generate_rowid_ranges (
i_owner_name in varchar2,
i_segment_name in varchar2,
i_chunk_size in number default 160
);end;.
/
create or replace package body y2k_tools as--
-- Private function to turn a start block and block count-- into starting and ending rowids for that range
--
procedure generate_range (
i_file_id in number,
i_block_id in number,
i_blocks in number,
o_start out rowid,
o_end out rowid
) is
begino_start := chartorowid(
lpad(jpl_utils.decimal_to_hex(i_block_id),8,'0') ||
'.0000.' || lpad(jpl_utils.decimal_to_hex(i_file_id),4,'0')
);o_end := chartorowid(
lpad(jpl_utils.decimal_to_hex(
i_block_id + i_blocks - 1
),8,'0'
) || '.7FFF.' || lpad(jpl_utils.decimal_to_hex(i_file_id),4,'0')
);end generate_range;
procedure generate_rowid_ranges (
i_owner_name in varchar2,
i_segment_name in varchar2,
i_chunk_size in number default 160
) is
cursor c1 (v_owner_name varchar2, v_segment_name varchar2) is
select
block_id, file_id, blocks
from
dba_extents
where
segment_name = upper(v_segment_name)
and owner = upper(v_owner_name)
;m_tmp c1%rowtype;
m_rowid_start rowid;
m_rowid_end rowid;
beginfor r1 in c1(i_owner_name, i_segment_name) loop
m_tmp := r1;
while (m_tmp.blocks >= i_chunk_size ) loop
generate_range(
i_file_id => m_tmp.file_id,
i_block_id => m_tmp.block_id,
i_blocks => i_chunk_size,
o_start => m_rowid_start,
o_end => m_rowid_end
);dbms_output.put_line(
m_rowid_start || ' - ' || m_rowid_end
);insert into rowid_control(
owner_name,
segment_name,
rowid_start,
rowid_end,
status
)values (
i_owner_name,
i_segment_name,
m_rowid_start,
m_rowid_end,
'N' );m_tmp.blocks := m_tmp.blocks - i_chunk_size;
m_tmp.block_id := m_tmp.block_id + i_chunk_size;
end loop ;
if (m_tmp.blocks != 0) then -- pick up the last bit
generate_range(
i_file_id => m_tmp.file_id,
i_block_id => m_tmp.block_id,
i_blocks => m_tmp.blocks,
o_start => m_rowid_start,
o_end => m_rowid_end
);dbms_output.put_line(
m_rowid_start || ' - ' || m_rowid_end
);insert into rowid_control(
owner_name,
segment_name,
rowid_start,
rowid_end,
status
)values (
i_owner_name,
i_segment_name,
m_rowid_start,
m_rowid_end,
'N' );end if;
end loop;
end generate_rowid_ranges;
end y2k_tools;.
/
Example showing the use of the package:
set serveroutput on size
100000
execute y2k_tools.generate_rowid_ranges('jpl','big_junk',320);
0001D6BB.0000.0022 - 0001D7BE.7FFF.0022
0001D7BF.0000.0022 - 0001D8C2.7FFF.0022
0001D8C3.0000.0022 - 0001D9C6.7FFF.0022
0001D9C7.0000.0022 - 0001DACA.7FFF.0022
0001DACB.0000.0022 - 0001DBCE.7FFF.0022
0001DBCF.0000.0022 - 0001DCD2.7FFF.0022
0001DCD3.0000.0022 - 0001DDD6.7FFF.0022
0001DDD7.0000.0022 - 0001DEDA.7FFF.0022
0001DEDB.0000.0022 - 0001DFDE.7FFF.0022
0001DFDF.0000.0022 - 0001E0E2.7FFF.0022
0001E0E3.0000.0022 - 0001E1E6.7FFF.0022
0001E1E7.0000.0022 - 0001E2EA.7FFF.0022
0001E2EB.0000.0022 - 0001E3EE.7FFF.0022
0001E3EF.0000.0022 - 0001E4F2.7FFF.0022
0001E4F3.0000.0022 - 0001E5F6.7FFF.0022
At the same time, the rows above were also inserted into the rowid_control table. To demonstrate that the code has produced a range of rowids that covers the table completely without overlap, I then ran the script below to show that I could use the list of rowid ranges to count the rows in the table
Whilst I have only done a 'select count(*)' in the script below, you could equally well have written an update statement. In a further article I will demonstrate a general purpose table-based parallel allocation mechanism that you can use to control concurrent processes. This will use the rowid_control table to identify sections of the table to update..
remrem get_test.sql
remselect count(*) from jpl.big_junk;
set serveroutput on
declarecursor c1 is
select rowid_start, rowid_end
from rowid_control
where owner_name = 'JPL'
and segment_name = 'BIG_JUNK';
m_temp number := 0;
m_count number := 0;
beginm_count := 0;
for r1 in c1 loop
select /*+ rowid */
count(*) into m_temp
from jpl.big_junk
where rowid between r1.rowid_start and r1.rowid_end
;m_count := m_count + m_temp;
end loop;
dbms_output.put_line(m_count);
end;.
/




