暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

AWS redshift->hdb pg(Greenplum),DDL语法转换规则

digoal 2018-10-17
391

作者

digoal

日期

2018-10-17

标签

PostgreSQL , redshift , greenplum , 语法转换


背景

redshift 和 Greenplum都是源自PostgreSQL的MPP数据库:

https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases

语法上有一些差异,从redshift切换到greenplum(或阿里云hdb pg)时,DDL需要做一定的转换。

redshift create table

https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] table_name ( { column_name data_type [column_attributes] [ column_constraints ] | table_constraints | LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] } [, ... ] ) [ BACKUP { YES | NO } ] -- gpdb 不支持,(指定是否自动快照备份) [table_attribute]

where column_attributes are:

[ DEFAULT default_expr ] [ IDENTITY ( seed, step ) ] -- gpdb 不支持该语法,使用SEQUENCE+DEFAULT代替 [ ENCODE encoding ] -- 指定列压缩算法,gpdb 不支持该语法,使用全局压缩算法代替 [ DISTKEY ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替 [ SORTKEY ] -- gpdb 不支持,阿里云hdb pg支持,语法参考后面的转换规则。

and column_constraints are:

[ { NOT NULL | NULL } ] [ { UNIQUE | PRIMARY KEY } ] [ REFERENCES reftable [ ( refcolumn ) ] ]

and table_constraints are:

[ UNIQUE ( column_name [, ... ] ) ] [ PRIMARY KEY ( column_name [, ... ] ) ] [ FOREIGN KEY (column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ]

and table_attributes are:

[ DISTSTYLE { EVEN | KEY | ALL } ] -- gpdb 不支持该语法,even使用distributed randomly代替,key使用distributed by (colname1,...) 代替,ALL不支持(维度表,所有SEGMENT都有一份)。 [ DISTKEY ( column_name ) ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替 [ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ] -- gpdb 不支持,阿里云HDB PG支持sortkey,语法参考后面的转换规则

greenplum(阿里云hdb pg) create table

https://gpdb.docs.pivotal.io/43300/ref_guide/sql_commands/CREATE_TABLE.html

Synopsis

CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name ( [ { column_name data_type [ DEFAULT default_expr ] [column_constraint [ ... ] [ ENCODING ( storage_directive [,...] ) ] ] | table_constraint | LIKE other_table [{INCLUDING | EXCLUDING} {DEFAULTS | CONSTRAINTS}] ...} [, ... ] ] ) [ INHERITS ( parent_table [, ... ] ) ] [ WITH ( storage_parameter=value [, ... ] ) [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ] [ TABLESPACE tablespace ] [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ] [ PARTITION BY partition_type (column) [ SUBPARTITION BY partition_type (column) ] [ SUBPARTITION TEMPLATE ( template_spec ) ] [...] ( partition_spec ) | [ SUBPARTITION BY partition_type (column) ] [...] ( partition_spec [ ( subpartition_spec [(...)] ) ] )

where column_constraint is:

[CONSTRAINT constraint_name] NOT NULL | NULL | UNIQUE [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR = value )] | PRIMARY KEY [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR = value )] | CHECK ( expression ) | REFERENCES table_name [ ( column_name [, ... ] ) ] [ key_match_type ] [ key_action ]

where storage_directive for a column is:

COMPRESSTYPE={ZLIB | QUICKLZ | RLE_TYPE | NONE} [COMPRESSLEVEL={0-9} ] [BLOCKSIZE={8192-2097152} ]

where storage_parameter for the table is:

APPENDONLY={TRUE|FALSE} BLOCKSIZE={8192-2097152} ORIENTATION={COLUMN|ROW} CHECKSUM={TRUE|FALSE} COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE} COMPRESSLEVEL={0-9} FILLFACTOR={10-100} OIDS[=TRUE|FALSE]

and table_constraint is:

[CONSTRAINT constraint_name] UNIQUE ( column_name [, ... ] ) [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR=value )] | PRIMARY KEY ( column_name [, ... ] ) [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR=value )] | CHECK ( expression ) | FOREIGN KEY ( column_name [, ... ] ) REFERENCES table_name [ ( column_name [, ... ] ) ] [ key_match_type ] [ key_action ] [ key_checking_mode ]

where key_match_type is:

MATCH FULL | SIMPLE

where key_action is:

ON DELETE | ON UPDATE | NO ACTION | RESTRICT | CASCADE | SET NULL | SET DEFAULT

where key_checking_mode is:

DEFERRABLE | NOT DEFERRABLE | INITIALLY DEFERRED | INITIALLY IMMEDIATE

where partition_type is:

LIST | RANGE

where partition_specification is:

partition_element [, ...]
and partition_element is:

DEFAULT PARTITION name | [PARTITION name] VALUES (list_value [,...] ) | [PARTITION name] START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE] [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] | [PARTITION name] END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] [ WITH ( partition_storage_parameter=value [, ... ] ) ] [ TABLESPACE tablespace ]

where subpartition_spec or template_spec is:

subpartition_element [, ...]
and subpartition_element is:

DEFAULT SUBPARTITION name | [SUBPARTITION name] VALUES (list_value [,...] ) | [SUBPARTITION name] START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE] [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] | [SUBPARTITION name] END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ] [ WITH ( partition_storage_parameter=value [, ... ] ) ] [ TABLESPACE tablespace ]

where storage_parameter for a partition is:

APPENDONLY={TRUE|FALSE} BLOCKSIZE={8192-2097152} ORIENTATION={COLUMN|ROW} CHECKSUM={TRUE|FALSE} COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE} COMPRESSLEVEL={1-9} FILLFACTOR={10-100} OIDS[=TRUE|FALSE]

阿里云 hdb pg sortkey 语法

https://help.aliyun.com/knowledge_detail/59195.html

[ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ] [ SORTKEY (column, [ ... ] )]

例子

```
create table test(date text, time text, open float, high float, low float, volume int) with(APPENDONLY=true,ORIENTATION=column) sortkey (volume);

ALTER [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name SET SORTKEY (column, [ ... ] )
```

转换规则

1、去除完全不支持的语法

[ BACKUP { YES | NO } ] -- gpdb 不支持,(指定是否自动快照备份)

2、转换不支持但兼容的语法

where column_attributes are:

2.1、

[ IDENTITY ( seed, step ) ] -- gpdb 不支持该语法,使用SEQUENCE+DEFAULT代替

去掉并转换为:

create sequence seq1 start with seed increment by step; create table test ( id int default nextval('seq1') );

例如

```
postgres=# create sequence seq1 start with 100 increment by 2;
CREATE SEQUENCE
postgres=# select nextval('seq1');
nextval


 100

(1 row)

postgres=# select nextval('seq1');
nextval


 102

(1 row)
```

2.2、

```
[ ENCODE encoding ] -- 指定列压缩算法,gpdb 不支持该语法,使用全局压缩算法代替

BYTEDICT DELTA DELTA32K LZO MOSTLY8 MOSTLY16 MOSTLY32 RAW (no compression) RUNLENGTH TEXT255 TEXT32K ZSTD ```

去掉并转换为:

create table test (id int) with (APPENDONLY=true, COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}) -- 选择一个压缩算法 ;

2.3、

[ DISTKEY ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替

去掉并转换为:

create table test (id int, info text) distributed by (id); -- 分布列名与DISTKEY对应列名一致

2.4、

[ SORTKEY ] -- gpdb 不支持,阿里云hdb pg支持,语法参考后面的转换规则。

去掉并转换为:

```
create table test(date text, time text, open float, high float, low float, volume int)
with(APPENDONLY=true,ORIENTATION=column)
sortkey (volume); -- 这里指定sortkey

ALTER [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name
SET SORTKEY (column, [ ... ] ); -- 这里指定sortkey
```

如果设置了sort key,在数据导入完成后,执行如下SQL,(堵塞DDL,DML,DSL,重新对数据排序)

VACUUM SORT ONLY [tablename]

and table_attributes are:

2.5、

[ DISTSTYLE { EVEN | KEY | ALL } ] -- gpdb 不支持该语法,even使用distributed randomly代替,key使用distributed by (colname1,...) 代替,ALL不支持(维度表,所有SEGMENT都有一份)。

去掉并转换为:

```
distributed randomly

distributed by (colname1, ...)
```

2.6、

[ DISTKEY ( column_name ) ] -- gpdb 不支持该语法,但是redshift只支持一列作为分布键,GPDB支持多列作为分布键,gpdb使用distributed by(colname1, ...)代替

去掉并转换为:

distributed by(colname1, ...)

2.7、

[ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ] -- gpdb 不支持,阿里云HDB PG支持sortkey,语法参考后面的转换规则

同2.4。

COMPOUND表示完全按用户指定的字段排序,类似PG的CLUSTER。

INTERLEAVED表示按维度排序,任意列的顺序都是公平的,类似多维空间聚集存放。

参考

https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

https://gpdb.docs.pivotal.io/43300/ref_guide/sql_commands/CREATE_TABLE.html

https://help.aliyun.com/knowledge_detail/59195.html

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论