暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle

digoal 2017-10-30
455

作者

digoal

日期

2017-10-30

标签

PostgreSQL , dump_stat , 统计信息 , 导出导入


背景

《PostgreSQL 规格评估 - 微观、宏观、精准 多视角估算数据库性能(选型、做预算不求人)》

EXPLAIN是PG数据库用于输出SQL执行计划的语法,

1、生成的执行计划中包含COST一项。

如果校准了成本因子,COST可以和SQL实际执行时间对其。因子校对的方法如下,实际上每一种硬件,我们只需要校对一遍即可。

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

校对因子如下:

```

seq_page_cost = 1.0 # measured on an arbitrary scale

random_page_cost = 1.2 # same scale as above

cpu_tuple_cost = 0.01 # same scale as above

cpu_index_tuple_cost = 0.005 # same scale as above

cpu_operator_cost = 0.0025 # same scale as above

以下不需要校对, 不考虑并行计算SQL

parallel_tuple_cost = 0.1 # same scale as above
parallel_setup_cost = 1000.0 # same scale as above
effective_cache_size = 10GB
```

2、评估COST还需要依赖统计信息柱状图:

```
涉及reltuples, relpages. 表示评估的记录数以及占用多少个数据块。注意源头的block_size可能和PG的不一致,占用多少个块需要转换一下。(show block_size可以查看数据块大小。)

postgres=# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Collation | Nullable | Default
---------------------+--------------+-----------+----------+---------
relname | name | | not null | -- 对象名
relnamespace | oid | | not null | -- 对象所属的schema, 对应pg_namespace.oid
reltype | oid | | not null |
reloftype | oid | | not null |
relowner | oid | | not null |
relam | oid | | not null |
relfilenode | oid | | not null |
reltablespace | oid | | not null |
relpages | integer | | not null | -- 评估的页数(单位为block_size)
reltuples | real | | not null | -- 评估的记录数
relallvisible | integer | | not null |
reltoastrelid | oid | | not null |
relhasindex | boolean | | not null |
relisshared | boolean | | not null |
relpersistence | "char" | | not null |
relkind | "char" | | not null |
relnatts | smallint | | not null |
relchecks | smallint | | not null |
relhasoids | boolean | | not null |
relhaspkey | boolean | | not null |
relhasrules | boolean | | not null |
relhastriggers | boolean | | not null |
relhassubclass | boolean | | not null |
relrowsecurity | boolean | | not null |
relforcerowsecurity | boolean | | not null |
relispopulated | boolean | | not null |
relreplident | "char" | | not null |
relispartition | boolean | | not null |
relfrozenxid | xid | | not null |
relminmxid | xid | | not null |
relacl | aclitem[] | | |
reloptions | text[] | | |
relpartbound | pg_node_tree | | |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

涉及 空值比例、平均列宽、唯一值比例或个数、高频值以及频率、柱状图分布、存储相关性、多值列(高频元素及比例、元素柱状图分布)。

-- 这个是视图:

postgres=# \d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Collation | Nullable | Default
------------------------+----------+-----------+----------+---------
schemaname | name | | | -- 对象所属的schema
tablename | name | | | -- 对象名
attname | name | | | -- 列名
inherited | boolean | | | -- 是否为继承表的统计信息(false时表示当前表的统计信息,true时表示包含所有继承表的统计信息)
null_frac | real | | | -- 该列空值比例
avg_width | integer | | | -- 该列平均长度
n_distinct | real | | | -- 该列唯一值个数(-1表示唯一,小于1表示占比,大于等于1表示实际的唯一值个数)
most_common_vals | anyarray | | | -- 该列高频词
most_common_freqs | real[] | | | -- 该列高频词对应的出现频率
histogram_bounds | anyarray | | | -- 该列柱状图(表示隔出的每个BUCKET的记录数均等)
correlation | real | | | -- 该列存储相关性(-1到1的区间),绝对值越小,存储越离散。小于0表示反向相关,大于0表示正向相关
most_common_elems | anyarray | | | -- 该列为多值类型(数组)时,多值元素的高频词
most_common_elem_freqs | real[] | | | -- 多值元素高频词的出现频率
elem_count_histogram | real[] | | | -- 多值元素的柱状图中,每个区间的非空唯一元素个数

-- 这个是实际存储的数据(也就是要导入的部分):
-- https://www.postgresql.org/docs/10/static/catalog-pg-statistic.html

postgres=# \d pg_statistic
Table "pg_catalog.pg_statistic"
Column | Type | Collation | Nullable | Default
-------------+----------+-----------+----------+---------
starelid | oid | | not null | -- 对象OID,对应pg_class.oid
staattnum | smallint | | not null | -- 该列在表中的位置序号,对应pg_attribute.attnum
stainherit | boolean | | not null | -- 是否为继承表的统计信息(false时表示当前表的统计信息,true时表示包含所有继承表的统计信息)
stanullfrac | real | | not null | -- 空值比例
stawidth | integer | | not null | -- 平均长度
stadistinct | real | | not null | -- 唯一值个数、比例
stakind1 | smallint | | not null | -- 表示第1个SLOT的统计信息分类编号
stakind2 | smallint | | not null | -- 表示第2个SLOT的统计信息分类编号
stakind3 | smallint | | not null | -- 表示第3个SLOT的统计信息分类编号
stakind4 | smallint | | not null | -- 表示第4个SLOT的统计信息分类编号
stakind5 | smallint | | not null | -- 表示第5个SLOT的统计信息分类编号
staop1 | oid | | not null | -- 表示第1个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)
staop2 | oid | | not null | -- 表示第2个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)
staop3 | oid | | not null | -- 表示第3个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)
staop4 | oid | | not null | -- 表示第4个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)
staop5 | oid | | not null | -- 表示第5个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)
stanumbers1 | real[] | | | -- 表示第1个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。
stanumbers2 | real[] | | | -- 表示第2个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。
stanumbers3 | real[] | | | -- 表示第3个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。
stanumbers4 | real[] | | | -- 表示第4个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。
stanumbers5 | real[] | | | -- 表示第5个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。
stavalues1 | anyarray | | | -- 表示第1个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。
stavalues2 | anyarray | | | -- 表示第2个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。
stavalues3 | anyarray | | | -- 表示第3个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。
stavalues4 | anyarray | | | -- 表示第4个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。
stavalues5 | anyarray | | | -- 表示第5个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。
Indexes:
"pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)
```

statkind的定义

src/include/catalog/pg_statistic.h

```
/
* Currently, five statistical slot "kinds" are defined by core PostgreSQL,
* as documented below. Additional "kinds" will probably appear in
* future to help cope with non-scalar datatypes. Also, custom data types
* can define their own "kind" codes by mutual agreement between a custom
* typanalyze routine and the selectivity estimation functions of the type's
* operators.

* Code reading the pg_statistic relation should not assume that a particular
* data "kind" will appear in any particular slot. Instead, search the
* stakind fields to see if the desired data is available. (The standard
* function get_attstatsslot() may be used for this.)
*/

/
* The present allocation of "kind" codes is:

* 1-99: reserved for assignment by the core PostgreSQL project
* (values in this range will be documented in this file)
* 100-199: reserved for assignment by the PostGIS project
* (values to be documented in PostGIS documentation)
* 200-299: reserved for assignment by the ESRI ST_Geometry project
* (values to be documented in ESRI ST_Geometry documentation)
* 300-9999: reserved for future public assignments

* For private use you may choose a "kind" code at random in the range
* 10000-30000. However, for code that is to be widely disseminated it is
* better to obtain a publicly defined "kind" code by request from the
* PostgreSQL Global Development Group.
/

/
* In a "most common values" slot, staop is the OID of the "=" operator
* used to decide whether values are the same or not. stavalues contains
* the K most common non-null values appearing in the column, and stanumbers
* contains their frequencies (fractions of total row count). The values
* shall be ordered in decreasing frequency. Note that since the arrays are
* variable-size, K may be chosen by the statistics collector. Values should
* not appear in MCV unless they have been observed to occur more than once;
* a unique column will have no MCV slot.
/

define STATISTIC_KIND_MCV 1

/
* A "histogram" slot describes the distribution of scalar data. staop is
* the OID of the "<" operator that describes the sort ordering. (In theory,
* more than one histogram could appear, if a datatype has more than one
* useful sort operator.) stavalues contains M (>=2) non-null values that
* divide the non-null column data values into M-1 bins of approximately equal
* population. The first stavalues item is the MIN and the last is the MAX.
* stanumbers is not used and should be NULL. IMPORTANT POINT: if an MCV
* slot is also provided, then the histogram describes the data distribution
*
after removing the values listed in MCV (thus, it's a "compressed
* histogram" in the technical parlance). This allows a more accurate
* representation of the distribution of a column with some very-common
* values. In a column with only a few distinct values, it's possible that
* the MCV list describes the entire data population; in this case the
* histogram reduces to empty and should be omitted.
/

define STATISTIC_KIND_HISTOGRAM 2

/
* A "correlation" slot describes the correlation between the physical order
* of table tuples and the ordering of data values of this column, as seen
* by the "<" operator identified by staop. (As with the histogram, more
* than one entry could theoretically appear.) stavalues is not used and
* should be NULL. stanumbers contains a single entry, the correlation
* coefficient between the sequence of data values and the sequence of
* their actual tuple positions. The coefficient ranges from +1 to -1.
/

define STATISTIC_KIND_CORRELATION 3

/
* A "most common elements" slot is similar to a "most common values" slot,
* except that it stores the most common non-null
elements of the column
* values. This is useful when the column datatype is an array or some other
* type with identifiable elements (for instance, tsvector). staop contains
* the equality operator appropriate to the element type. stavalues contains
* the most common element values, and stanumbers their frequencies. Unlike
* MCV slots, frequencies are measured as the fraction of non-null rows the
* element value appears in, not the frequency of all rows. Also unlike
* MCV slots, the values are sorted into the element type's default order
* (to support binary search for a particular value). Since this puts the
* minimum and maximum frequencies at unpredictable spots in stanumbers,
* there are two extra members of stanumbers, holding copies of the minimum
* and maximum frequencies. Optionally, there can be a third extra member,
* which holds the frequency of null elements (expressed in the same terms:
* the fraction of non-null rows that contain at least one null element). If
* this member is omitted, the column is presumed to contain no null elements.

* Note: in current usage for tsvector columns, the stavalues elements are of
* type text, even though their representation within tsvector is not
* exactly text.
*/

define STATISTIC_KIND_MCELEM 4

/
* A "distinct elements count histogram" slot describes the distribution of
* the number of distinct element values present in each row of an array-type
* column. Only non-null rows are considered, and only non-null elements.
* staop contains the equality operator appropriate to the element type.
* stavalues is not used and should be NULL. The last member of stanumbers is
* the average count of distinct element values over all non-null rows. The
* preceding M (>=2) members form a histogram that divides the population of
* distinct-elements counts into M-1 bins of approximately equal population.
* The first of these is the minimum observed count, and the last the maximum.
/

define STATISTIC_KIND_DECHIST 5

/
* A "length histogram" slot describes the distribution of range lengths in
* rows of a range-type column. stanumbers contains a single entry, the
* fraction of empty ranges. stavalues is a histogram of non-empty lengths, in
* a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
* values that divide the column data values into M-1 bins of approximately
* equal population. The lengths are stored as float8s, as measured by the
* range type's subdiff function. Only non-null rows are considered.
/

define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM 6

/
* A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for
* a range-type column. stavalues contains M (>=2) range values that divide
* the column data values into M-1 bins of approximately equal population.
* Unlike a regular scalar histogram, this is actually two histograms combined
* into a single array, with the lower bounds of each value forming a
* histogram of lower bounds, and the upper bounds a histogram of upper
* bounds. Only non-NULL, non-empty ranges are included.
/

define STATISTIC_KIND_BOUNDS_HISTOGRAM 7

```

那么这些统计信息如何导入导出呢?

导出导入统计信息

https://postgrespro.com/docs/postgresproee/9.6/dump-stat.html

dump_stat这个插件是PostgreSQL pro推出的兼容9.6版本的导出统计信息的插件。

代码如下:

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

导出

通过dump_stat导出(导出的结构已经是SQL形式),然后通过SQL导入。

$ psql test -A test=# \t test=# \o dump_stat.sql test=# select dump_statistic();

pg_statistic的每一条记录,产生一条如下SQL:

```
WITH upsert as (
UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]
WHERE to_schema_qualified_relation(starelid) = t_relname
AND to_attname(t_relname, staattnum) = t_attname
AND to_atttype(t_relname, staattnum) = t_atttype
AND stainherit = t_stainherit
RETURNING *)
ins as (
SELECT expression [, ...]
WHERE NOT EXISTS (SELECT * FROM upsert)
AND to_attnum(t_relname, t_attname) IS NOT NULL
AND to_atttype(t_relname, t_attname) = t_atttype)
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;

where expression can be one of:

array_in(array_text, type_name::regtype::oid, -1)
value::type_name
```

stat导入的实际例子

(表示public.test表的info列的统计信息,如果存在则更新,不存在则插入。)

WITH upsert as ( UPDATE pg_catalog.pg_statistic SET stanullfrac = 0, stawidth = 4, stadistinct = -1, stakind1 = 2, stakind2 = 3, stakind3 = 0, stakind4 = 0, stakind5 = 0, staop1 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop2 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop3 = '0'::regoperator, staop4 = '0'::regoperator, staop5 = '0'::regoperator, stanumbers1 = NULL::real[], stanumbers2 = '{-0.5}'::real[], stanumbers3 = NULL::real[], stanumbers4 = NULL::real[], stanumbers5 = NULL::real[], stavalues1 = array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray, stavalues2 = NULL::anyarray, stavalues3 = NULL::anyarray, stavalues4 = NULL::anyarray, stavalues5 = NULL::anyarray WHERE to_schema_qualified_relation(starelid) = 'public.test' AND to_attname('public.test', staattnum) = 'info' AND to_atttype('public.test', staattnum) = 'pg_catalog.text' AND stainherit = false RETURNING * ), ins as ( SELECT 'public.test'::regclass, to_attnum('public.test', 'info'), 'false'::boolean, 0::real, 4::integer, -1::real, 2, -- stakind=2 表示柱状图 3, -- stakind=3 表示相关性 0, 0, 0, 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, '0'::regoperator, '0'::regoperator, '0'::regoperator, NULL::real[], '{-0.5}'::real[], NULL::real[], NULL::real[], NULL::real[], array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray, NULL::anyarray, NULL::anyarray, NULL::anyarray, NULL::anyarray WHERE NOT EXISTS (SELECT * FROM upsert) AND to_attnum('public.test', 'info') IS NOT NULL AND to_atttype('public.test', 'info') = 'pg_catalog.text' ) INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;

导入

1、修改postgresql.conf,允许修改系统表,重启数据库生效配置

```
vi postgresql.conf

allow_system_table_mods=on

pg_ctl restart -m fast
```

2、导入统计信息

```
-- 1 pg_class

update pg_class set reltuples=?, relpages=? where oid=?;

-- 2 pg_statistic

WITH upsert as (
UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]
WHERE to_schema_qualified_relation(starelid) = t_relname
AND to_attname(t_relname, staattnum) = t_attname
AND to_atttype(t_relname, staattnum) = t_atttype
AND stainherit = t_stainherit
RETURNING *)
ins as (
SELECT expression [, ...]
WHERE NOT EXISTS (SELECT * FROM upsert)
AND to_attnum(t_relname, t_attname) IS NOT NULL
AND to_atttype(t_relname, t_attname) = t_atttype)
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;

where expression can be one of:

array_in(array_text, type_name::regtype::oid, -1)
value::type_name
```

3、导入完成后,将allow_system_table_mods设置为off,重启数据库。

dump_statistic代码

```
CREATE FUNCTION dump_statistic(relid oid) RETURNS SETOF TEXT AS $$
DECLARE
result text;

            cmd             text;           -- main query  
            in_args text;           -- args for insert  
            up_args text;           -- args for upsert

            fstaop  text := '%s::regoperator';  
            arr_in  text := 'array_in(%s, %s::regtype, -1)::anyarray';

            stacols text[] = ARRAY['stakind', 'staop',  
                                                       'stanumbers', 'stavalues' ];

            r               record;  
            i               int;  
            j               text;  
            ncols   int := 26;      -- number of columns in pg_statistic

            stanum  text[];         -- stanumbers{1, 5}  
            staval  text[];         -- stavalues{1, 5}  
            staop   text[];         -- staop{1, 5}

            relname text;           -- quoted relation name  
            attname text;           -- quoted attribute name  
            atttype text;           -- quoted attribute type

    BEGIN  
            for r in  
                            select * from pg_catalog.pg_statistic  
                            where starelid = relid  
                                    and get_namespace(starelid) != to_namespace('information_schema')  
                                    and get_namespace(starelid) != to_namespace('pg_catalog') loop

                    relname := to_schema_qualified_relation(r.starelid);  
                    attname := quote_literal(to_attname(relname, r.staattnum));  
                    atttype := quote_literal(to_atttype(relname, r.staattnum));  
                    relname := quote_literal(relname); -- redefine relname

                    in_args := '';  
                    up_args = 'stanullfrac = %s, stawidth = %s, stadistinct = %s, ';

                    cmd := 'WITH upsert as ( ' ||  
                                            'UPDATE pg_catalog.pg_statistic SET %s ' ||  
                                            'WHERE to_schema_qualified_relation(starelid) = ' || relname || ' '  
                                                    'AND to_attname(' || relname || ', staattnum) = ' || attname || ' '  
                                                    'AND to_atttype(' || relname || ', staattnum) = ' || atttype || ' '  
                                                    'AND stainherit = ' || r.stainherit || ' ' ||  
                                            'RETURNING *), ' ||  
                               'ins as ( ' ||  
                                            'SELECT %s ' ||  
                                            'WHERE NOT EXISTS (SELECT * FROM upsert) ' ||  
                                                    'AND to_attnum(' || relname || ', ' || attname || ') IS NOT NULL '  
                                                    'AND to_atttype(' || relname || ', ' || attname || ') = ' || atttype || ') '  
                               'INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;';

                    for i in 1..ncols loop  
                            in_args := in_args || '%s';

                            if i != ncols then  
                                    in_args := in_args || ', ';  
                            end if;  
                    end loop;

                    for j in 1..4 loop  
                            for i in 1..5 loop  
                                    up_args := up_args || format('%s%s = %%s', stacols[j], i);

                                    if i * j != 20 then  
                                            up_args := up_args || ', ';  
                                    end if;  
                            end loop;  
                    end loop;

                    cmd := format(cmd, up_args, in_args);   --prepare template for main query

                    staop := array[format(fstaop, quote_literal(to_schema_qualified_operator(r.staop1))),  
                                               format(fstaop, quote_literal(to_schema_qualified_operator(r.staop2))),  
                                               format(fstaop, quote_literal(to_schema_qualified_operator(r.staop3))),  
                                               format(fstaop, quote_literal(to_schema_qualified_operator(r.staop4))),  
                                               format(fstaop, quote_literal(to_schema_qualified_operator(r.staop5)))];

                    stanum := array[r.stanumbers1::text,  
                                                    r.stanumbers2::text,  
                                                    r.stanumbers3::text,  
                                                    r.stanumbers4::text,  
                                                    r.stanumbers5::text];

                    for i in 1..5 loop  
                            if stanum[i] is null then  
                                    stanum[i] := 'NULL::real[]';  
                            else  
                                    stanum[i] := '''' || stanum[i] || '''::real[]';  
                            end if;  
                    end loop;

                    if r.stavalues1 is not null then  
                            staval[1] := format(arr_in, quote_literal(r.stavalues1),  
                                                                    quote_literal(  
                                                                            to_schema_qualified_type(  
                                                                                    anyarray_elemtype(r.stavalues1))));  
                    else  
                            staval[1] := 'NULL::anyarray';  
                    end if;

                    if r.stavalues2 is not null then  
                            staval[2] := format(arr_in, quote_literal(r.stavalues2),  
                                                                    quote_literal(  
                                                                            to_schema_qualified_type(  
                                                                                    anyarray_elemtype(r.stavalues2))));  
                    else  
                            staval[2] := 'NULL::anyarray';  
                    end if;

                    if r.stavalues3 is not null then  
                            staval[3] := format(arr_in, quote_literal(r.stavalues3),  
                                                                    quote_literal(  
                                                                            to_schema_qualified_type(  
                                                                                    anyarray_elemtype(r.stavalues3))));  
                    else  
                            staval[3] := 'NULL::anyarray';  
                    end if;

                    if r.stavalues4 is not null then  
                            staval[4] := format(arr_in, quote_literal(r.stavalues4),  
                                                                    quote_literal(  
                                                                            to_schema_qualified_type(  
                                                                                    anyarray_elemtype(r.stavalues4))));  
                    else  
                            staval[4] := 'NULL::anyarray';  
                    end if;

                    if r.stavalues5 is not null then  
                            staval[5] := format(arr_in, quote_literal(r.stavalues5),  
                                                                    quote_literal(  
                                                                            to_schema_qualified_type(  
                                                                                    anyarray_elemtype(r.stavalues5))));  
                    else  
                            staval[5] := 'NULL::anyarray';  
                    end if;

                    --DEBUG  
                    --staop := array['{arr}', '{arr}', '{arr}', '{arr}', '{arr}'];  
                    --stanum := array['{num}', '{num}', '{num}', '{num}', '{num}'];  
                    --staval := array['{val}', '{val}', '{val}', '{val}', '{val}'];

                    result := format(cmd,  
                                                     r.stanullfrac,  
                                                     r.stawidth,  
                                                     r.stadistinct,  
                                                     -- stakind  
                                                     r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,  
                                                     -- staop  
                                                     staop[1], staop[2], staop[3], staop[4], staop[5],  
                                                     -- stanumbers  
                                                     stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],  
                                                     -- stavalues  
                                                     staval[1], staval[2], staval[3], staval[4], staval[5],

                                                     -- first 6 columns  
                                                     format('%s::regclass', relname),  
                                                     format('to_attnum(%s, %s)', relname, attname),  
                                                     '''' || r.stainherit || '''::boolean',  
                                                     r.stanullfrac || '::real',  
                                                     r.stawidth || '::integer',  
                                                     r.stadistinct || '::real',  
                                                     -- stakind  
                                                     r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,  
                                                     -- staop  
                                                     staop[1], staop[2], staop[3], staop[4], staop[5],  
                                                     -- stanumbers  
                                                     stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],  
                                                     -- stavalues  
                                                     staval[1], staval[2], staval[3], staval[4], staval[5]);

                    return next result;  
            end loop;

            return;  
    END;

$$ LANGUAGE plpgsql;
```

我们甚至可以将Oracle数据库的统计信息,平移到PG数据库,对齐需要的元素即可:

记录数、占用多少个数据块。每列的空值比例、平均列宽、唯一值比例或个数、高频值以及频率、柱状图分布、存储相关性、多值列(高频元素及比例、元素柱状图分布)。

好处:在迁移ORACLE数据时,可以关闭autovacuumm(提高导入速度),通过这种方法来导入统计信息。(只要元素对应即可,当然有些元素可能是ORACLE中不采集的,比如多值列的统计信息)。

参考

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论