Pipeline Functions

2011-01-01

505

JL Computer Consultancy

A slightly unusual use of pipeline functions

July 2006

A large part of my job is to find ways to make databases “go faster” without making any significant changes to structure or code. “Significant” is a bit vague, of course, but typically means things like “invisible to the application” as far as the structure is concerned, or “very localized” if code is involved, and always means “minimum risk, minimum side-effects”.

I have come across two examples recently where short data-loading scripts could be modified very locally to reduce the workload resources used – particularly CPU and disk activity, quite dramatically. It isn’t a mechanism that will be universally appropriate – but for the right data pattern it’s certainly worth considering.

The requirement is quite simple. We had data loads where rows belong to ‘groups’, need to be given some form of ‘group id’ as they get into the database. There are many ways to implement this type of requirement, with various costs involved. One option is to load the data, create a summary (or parent) table for the loaded data, then use the summary table to generate a sequence-based id. Another option would be to play around with analytic functions, partitioning on the “group” columns, and using a mechanism to bring in a “group id” on the first row of each “group”.

Every option introduces its own drawbacks – the first example above, for example, may require you to do too much work populating and updating real (or perhaps global temporary) tables; the second option may do excessive amounts of sorting as a side effect of using analytic functions if the rows are wide rows.

SO, as an extra weapon in the armoury, here’s a method that uses pipeline functions to minimise the I/O and CPU costs of sorting or using tables. The example is given as a script with in-line comments. The notional task is to add a sequence number to rows, based on grouping values by multiple columns.

rem

rem Script: pipe_sequence.sql

rem Author: Jonathan Lewis

rem Dated: June 2006

rem Purpose: Getting a sequence number onto grouped data.

rem

rem Last tested

rem 10.2.0.1

rem 9.2.0.6

rem Not tested

rem 10.1.0.4

rem Not relevant

rem 8.1.7.4

rem

rem Notes:

rem There are various ways you might group data, and then

rem add a sequence number (or other unique identifier) to

rem rows that belong to the same group.

rem

rem This case was based on a 'store card' number used to group

rem together purchases that occured within a short time period

rem of each other. (The real system defined as visit as all

rem purchases within 10 minutes of each other - which required

rem playing games with rounding on a time interval).

rem

rem Other options

rem use of analytic functions

rem creating global temporary tables

rem calling functions that return either nextval or currval

rem

rem Enhancements:

rem Change the function to use pl/sql array fetches (bulk collect)

rem

drop table t1;

drop sequence s1;

drop type jpl_table_type;

drop type jpl_scalar_type;

rem

rem We will be collating rows with the same value for columns

rem group_1 and group_2, and giving them a single sequence value

rem

create sequence s1;

create table t1 as

select

rownum id,

mod(rownum-1,10) group_1,

lpad(mod(rownum-1,10),10,'0') group_2,

trunc((rownum-1)/10) data1,

rpad('x',100) padding

from

all_Objects

where

rownum <= 100

;

rem

rem A pipeline function needs an array type as its return,

rem a corresponding scalar type for each row created

rem

create type jpl_scalar_type as object(

group_1 number,

group_2 varchar2(10),

seq_id number

);

create type jpl_table_type as table of jpl_scalar_type;

rem

rem We select distinct the columns that are the identifying

rem columns – by hiding this query in a pipeline function we

rem a) can use the nextval construct

rem b) avoid creating an intermediate table

rem c) minimise the width of the row being sorted

rem

create or replace function pipe_fun

return jpl_table_type

pipelined

begin

for r in (

select group_1, group_2, s1.nextval seq_id

from (

select distinct group_1, group_2

from t1

order by group_1, group_2

)

loop

pipe row (jpl_scalar_type( r.group_1, r.group_2, r.seq_id));

end loop;

return;

end;

rem

rem To demonstrate how to use the mechanism we join the

rem the pipeline function to the base table on the grouping

rem columns – which allows us to append the sequence value

rem to the end of the row in (apparently) one step

rem

break on seq_id skip 1

column padding noprint

select

/*+ ordered use_hash (t) */

v.seq_id,

t.group_1,

t.group_2,

t.id,

t.data1,

t.padding

from

table(pipe_fun) v,

t1 t

where

t.group_1 = v.group_1

and t.group_2 = v.group_2

order by

v.seq_id, t.group_1, t.group_2, t.id

;

rem

rem Note – there is an “implementation detail” in 9i

rem which means you hit three rows in the dictionary

rem cache every time you fetch a row from the pipeline

rem function. This is a concurrency threat that has been

rem eliminated in 10g.

rem

Back to Index of Topics

oracle

最后修改时间：2020-04-16 14:51:44

Pipeline Functions

JL Computer Consultancy

A slightly unusual use of pipeline functions

July 2006

评论