数据库迁移-国产化-迁移建议-GreenPlum DB向GBase 8a 迁移

原创 manhuai 2022-09-02

901

1. 迁移建议

下面介绍一下迁移过程常用的一些迁移方法。

1.1. 连接操作符

GreenPlum中连接符“||”表示字符串连接，没有concat函数，而在GBase中连接符“||”既可以表示连接也可以表示或。当GBase系统变量sql_mode打开PIPES_AS_CONCAT时，“||”表示连接，和CONCAT函数等价，而当GBase系统变量sql_mode关闭PIPES_AS_CONCAT时，“||”表示或。

注意GBase在默认安装时会打开PIPES_AS_CONCAT参数，因此在应用系统迁移时，原有的连接符“||”可以不进行修改。如果想确认系统变量是否正确可以使用如下命令查看。

gbase> show variables like '%sql_mode%';

1.2. 异或操作符

GreenPlum中”#”表示异或操作符，”^”表示power函数，而在GBase中”#”表示注释，^表示异或。

GreenPlum中使用操作符”#”表示异或，GBase中表示注释

数据库

GreenPlum

GBase

语句

select 1 # 1;

select 1 # 1 awgege fegegew

;

结果

GreenPlum中使用操作符”^”表示power函数，GBase中表示异或

数据库	GreenPlum	GBase
语句	select 2 ^ 4;	select 2 ^ 4;
结果	16	6

1.3. EXCEPT/MINUS函数

GreenPlum中EXCEPT函数和GBase中MINUS函数等价，表示求两个查询结果的差集，GBase不支持EXCEPT关键字，如果使用会报语法错误。因此，在应用系统迁移时，需要将所有的EXCEPT改成MINUS。

数据库

GreenPlum

GBase

语句

create table t1(a int);

insert into t1 values(1);

create table t2(a int);

insert into t1 values(2);

select * from t1 except select * from t2;

create table t1(a int);

insert into t1 values(1);

create table t2(a int);

insert into t1 values(2);

select * from t1 minus select * from t2;

结果

1.4. 正则表达式

GreenPlum中正则表达式的支持非常灵活，支持正则表达式操作符，而在GBase中是不支持的，但是所有的这些操作符都可以使用GBase提供的REGEXP进行替换。此外，GreenPlum和GBase均实现了常见的正则表达式函数，例如regexp_replace函数，可以将模式匹配的字符串替换成指定的函数。

数据库

GreenPlum

GBase

语句

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’bcd’),(’cde’);

select * from t1 where a ~ ‘^(a|b)’;

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’bcd’),(’cde’);

select * from t1 where a regexp ‘^(a|b)’;

结果

abc

bcd

abc

bcd

数据库

GreenPlum

GBase

语句

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’abcabc’);

select regexp_replace(a, ‘a.c’,’A’) from t1;

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’abcabc’);

select regexp_replace(a, ‘a.c’,’A’) from t1;

结果

Aabc

注意GreenPlum中regexp_replace函数和GBase中的用法存在差异。GBase中默认是替换所有匹配到的字符串，而GreenPlum中只替换第一次匹配到的，如果想替换所有匹配的字符串，则需要额外提供参数’g’。因此在上述示例中如果将GreenPlum中的查询语句写为select regexp_replace(a, ‘a.c’,’A’,’g’) from t1；则查询结果就和GBase保持一致了。

1.5. 临时表

在GREENPLUM中，可以创建以下两种临时表：

1) 会话特有的临时表
CREATE TEMPORARY TABLE ON COMMIT PRESERVE ROWS；

2) 事务特有的临时表
CREATE TEMPORARY TABLE ON COMMIT DELETE ROWS；

例如：

创建会话特有的表

create temporary table t1(a int) on commit preserve rows;

创建事务特有的表

create temporary table t2(a int) on commit delete rows;

--ON COMMIT DELETE ROWS 说明临时表是事务指定，每次事务提交后将截断表（删除全部行）

--ON COMMIT PRESERVE ROWS 说明临时表是会话指定，当事务提交后，表中的数据不会截断，依然会保留。需要说明的是，因为所有的临时表在会话结束时都会被删除，因此在这里将ON COMMIT PRESERVE ROWS建立的表称之为会话特有的临时表。

GBase也有临时表。但是临时表被限制在当前连接中，当连接关闭时，临时表会自动地删除。这就意味着，两个不同的连接可以使用同一个临时表名而不会发生冲突，也不会与同名现有的表冲突（现有表将被隐藏，直到临时表被删除）。

CREATE TEMPORARY TABLE temp1

(id int auto_increment primary key, ClassifiedID int);

<。。。。。。>

DROP TEMPORARY TABLE temp1 ;

1.6. 时间操作

GreenPlum时间可以使用算术操作符进行操作，GBase可以使用 ADDDATE 函数，示例见下表：

GreenPlum	GBase
date '2001-09-28' + integer '7'	adddate('2001-09-28', interval 7 day)
date '2001-09-28' - integer '7'	adddate('2001-09-28', interval -7 day)
timestamp '2001-09-28 01:00' + interval '23 hours'	adddate('2001-09-28 01:00', interval 23 hour)
timestamp '2001-09-28 01:00' - interval '23 hours'	adddate('2001-09-28 01:00', interval -23 hour)

1.7. 复杂查询语句

在这一节中，介绍一下复杂查询语句的用法和迁移对比。

1.7.1. 使用 DISTINCT

DISTINCT 可以防止检索出重复的记录。

示例：

数据库

GREENPLUM

GBase

语句

drop table if exists locations;

Create table locations(country_id varchar(10));

Insert into locations values('AU');

Insert into locations values('BR');

Insert into locations values('CH');

Insert into locations values('CA');

select distinct country_id from locations;

drop table if exists locations;

Create table locations(country_id varchar(10));

Insert into locations values('AU');

Insert into locations values('BR');

Insert into locations values('CH');

Insert into locations values('CA');

select distinct country_id from locations;

结果

1.7.2. 使用子查询

GBase与GreenPlum均在 select 中支持子查询，一般不需要修改。

示例：

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

drop table if exists departments;

Create table employees(last_name varchar(10), department_id int);

Create table departments (department_id int, department_name varchar(10));

Insert into employees values('King', 1);

Insert into employees values('Kochhar', 1);

Insert into employees values('De Haan', 1);

Insert into employees values('Haha', 2);

Insert into departments values(1, 'Executive');

Insert into departments values(2, 'Normal');

select last_name from employees where department_id in (

select department_id from departments where department_name = 'Executive' );

drop table if exists employees;

drop table if exists departments;

Create table employees(last_name varchar(10), department_id int);

Create table departments (department_id int, department_name varchar(10));

Insert into employees values('King', 1);

Insert into employees values('Kochhar', 1);

Insert into employees values('De Haan', 1);

Insert into employees values('Haha', 2);

Insert into departments values(1, 'Executive');

Insert into departments values(2, 'Normal');

select last_name from employees where department_id in (

select department_id from departments where department_name = 'Executive' );

结果

King

Kochhar

De Haan

King

Kochhar

De Haan

1.7.3. 使用GROUP BY

GBase 的 GROUP BY 用法和 GREENPLUM的很相似，除了不支持的函数外基本上迁移时不用修改。

示例：

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

Create table employees(employee_id int, department_id int);

Insert into employees values(0, NULL);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

select count(employee_id),department_id from employees group by department_id order by department_id;

drop table if exists employees;

Create table employees(employee_id int, department_id int);

Insert into employees values(0, NULL);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

select count(employee_id),department_id from employees group by department_id order by department_id;

结果

1 NULL

1 10

2 20

1 NULL

1 10

2 20

1.7.4. 使用HAVING 子句

GREENPLUM支持在 GROUPBY 中使用 HAVING 子句，用法和 GBase一样。

示例：

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

Create table employees(employee_id int, department_id int);

Insert into employees values(10, 30);

Insert into employees values(11, 30);

Insert into employees values(21, 40);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

select count(employee_id),department_id from employees group by department_id having count(employee_id) >=2 order by department_id;

drop table if exists employees;

Create table employees(employee_id int, department_id int);

Insert into employees values(10, 30);

Insert into employees values(11, 30);

Insert into employees values(21, 40);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

select count(employee_id),department_id from employees group by department_id having count(employee_id) >=2 order by department_id;

结果

2 30

2 20

2 30

2 20

1.7.5. 使用 ROLLUP

GROUP BY 子句允许使用 WITH ROLLUP 修饰语将额外的行加到最后的输出中。这些行代表更高层的（或高聚集）求和操作。

示例：

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

Create table employees(salary int, department_id int);

Insert into employees values(13, 1);

Insert into employees values(34, 2);

Insert into employees values(51, 2);

Insert into employees values(36, 3);

Insert into employees values(27, 4);

Insert into employees values(40, 5);

select department_id,sum(salary) from employees where department_id is not null group by rollup(department_id);

drop table if exists employees;

Create table employees(salary int, department_id int);

Insert into employees values(13, 1);

Insert into employees values(34, 2);

Insert into employees values(51, 2);

Insert into employees values(36, 3);

Insert into employees values(27, 4);

Insert into employees values(40, 5);

select department_id,sum(salary) from employees where department_id is not null group by rollup(department_id);

结果

1 13

2 85

3 36

4 27

5 40

Total 201

1 13

2 85

3 36

4 27

5 40

Total 201

1.7.6. 外连接

GBase 支持外连接，且可以兼容 GREENPLUM 的格式，在迁移时一般不用修改。

示例：

数据库

GREENPLUM

GBase

语句

drop table if exists countries;

drop table if exists regions;

Create table countries(country_name varchar(10), region_id int);

Create table regions(region_name varchar(10), region_id int);

Insert into countries values('Argentina', 2);

Insert into countries values('Canada', 2);

Insert into countries values('Korea', 1);

Insert into countries values('Japan', 1);

Insert into regions values('Asia', 1);

Insert into regions values('Americas', 2);

Insert into regions values('Africa', 3);

select country_name, regions.region_name from countries left join regions on countries.region_id = regions.region_id and countries.region_id =2;

drop table if exists countries;

drop table if exists regions;

Create table countries(country_name varchar(10), region_id int);

Create table regions(region_name varchar(10), region_id int);

Insert into countries values('Argentina', 2);

Insert into countries values('Canada', 2);

Insert into countries values('Korea', 1);

Insert into countries values('Japan', 1);

Insert into regions values('Asia', 1);

Insert into regions values('Americas', 2);

Insert into regions values('Africa', 3);

select country_name, regions.region_name from countries left join regions on countries.region_id = regions.region_id and countries.region_id =2;

结果

Argentina Americas

Canada Americas

Korea

Japan

Argentina Americas

Canada Americas

Korea

Japan

1.8. 带子查询的更新

GBase 更新语句可以使用子查询语句，参见下面的示例：

示例：

数据库	GREENPLUM	GBase
建表	Drop table if exists Employees; Create table employees(salary int, department_id int); Insert into employees values(30000, 100); Insert into employees values(21600, 100); Insert into employees values(50000, 101); Drop table if exists DeptSalary; create table DeptSalary (EmployeeID int, salary float(20)) ;	Drop table if exists Employees; Create table employees(salary int, department_id int); Insert into employees values(30000, 100); Insert into employees values(21600, 100); Insert into employees values(50000, 101); Drop table if exists DeptSalary; create table DeptSalary (EmployeeID int, salary float(20)) ;
插值	insert into DeptSalary values(100,0 ) ;	insert into DeptSalary values(100,0 ) ;
更新语句	update DeptSalary set salary = ( select sum(salary) from Employees where department_id = 100);	update DeptSalary set salary = ( select sum(salary) from Employees where department_id = 100);
检索	select * from DeptSalary;	select * from DeptSalary;
结果	100 51600	100 51600

2. 迁移示例

下面我们将迁移一个实际的 GreenPlum 实例到 GBase 中。

2.1. 迁移准备

首先需要安装 GreenPlum数据库和GBase 数据库。GBase数据库的安装请参考GBase集群安装手册，下面重点说明GreenPlum集群安装过程。如果没有特别说明，文档中#表示root用户，$表示gpadmin用户。

2.1.1. GreenPlum集群部署概览

服务器主机名	IP	GreenPlum集群角色
redvm001	10.10.10.226	master节点 segment host节点
redvm002	10.10.10.227	segment host节点
redvm003	10.10.10.228	segment host节点

我们安装的GreenPlum集群为3点集群环境，其中包含1个master节点和3个segment host节点，考虑到作为实验环境，master节点压力不会太大，因此为了充分利用资源，这里 redvm001服务器既承担master角色又承担segment host角色。

2.1.2. 系统环境准备

为了确保GreenPlum集群运行在最佳的状态，需要按照如下说明修改系统的环境。

1）系统参数，对应文件位置/etc/sysctl.conf

kernel.shmmax = 500000000

kernel.shmmni = 4096

kernel.shmall = 4000000000

kernel.sem = 250 512000 100 2048

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.ip_forward = 0

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.ipv4.ip_local_port_range = 1025 65535

net.core.netdev_max_backlog = 10000

vm.overcommit_memory = 2

2）资源使用，对应文件位置/etc/security/limits.conf

* soft nofile 65536

* hard nofile 65536

* soft nproc 131072

* hard nproc 131072

3）磁盘IO调度，GreenPlum建议使用deadline

例如：echo deadline > /sys/block/sbd/queue/scheduler，其中sdb为GreenPlum实际安装路径所在的磁盘

2.1.3. GreenPlum集群安装

1）创建GreenPlum管理员账户

# groupadd -g 8000 gpadmin

# useradd -m -d /home/gpadmin -g gpadmin -u 8000 gpadmin

# passwd gpadmin

2）创建GreenPlum安装目录（如果执行默认安装，这步可以忽略）

# mkdir -p /opt/gp/greenplum # GP install path

# chown -R gpadmin:gpadmin /opt/gp

3）执行安装

# su – gpadmin

$ /bin/bash greenplum-db-4.3.10.0-build-1-rhel5-x86_64.bin

......

**********************************************************************

Do you accept the Pivotal Database license agreement? [yes|no]

**********************************************************************

yes

**********************************************************************

Provide the installation path for Greenplum Database or press ENTER to

accept the default installation path: /usr/local/greenplum-db-4.3.6.2

**********************************************************************

/opt/gp/greenplum # type in the GP install path

***********************************************************************

Install Greenplum Database into </opt/gp/greenplum>? [yes|no]

***********************************************************************

yes

Extracting product to /opt/gp/greenplum

***********************************************************************

Installation complete.

Greenplum Database is installed in /opt/gp/greenplum

Pivotal Greenplum documentation is available

for download at http://docs.gopivotal.com/gpdb

***********************************************************************

4）检查hosts配置，在所有节点的/etc/hosts文件中增加如下内容。

10.10.10.226 redvm001

10.10.10.227 redvm002

10.10.10.228 redvm003

5）用GreenPlum集群部落工具在所有节点安装GreenPlum数据库、创建GreenPlum管理员和建立互信

# source /opt/gp/greenplum/greenplum_path.sh

# gpseginstall -f /home/gpadmin/hostfile_exkeys -u gpadmin -p gpadmin

$ vim hostfile_exkeys # use gpadmin, all host

redvm001

redvm002

redvm003

:wq

2.1.4. 初始化GreenPlum集群

1） Master节点创建master数据目录

# mkdir /opt/gp/master

# chown gpadmin:gpadmin /opt/gp/master

2） Segment节点创建segements数据目录

# source /opt/gp/greenplum-db/greenplum_path.sh

# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /opt/gp/data/primary'

# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin:gpadmin /opt/gp/data/primary'

# vim hostfile_gpssh_segonly #only segment host

redvm001

redvm002

redvm003

:wq

3）修改配置文件

$ source /opt/gp/greenplum-db/greenplum_path.sh

$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config

$ chmod +w gpinitsystem_config

$ vim gpinitsystem_config

MASTER_DIRECTORY=/opt/gp/master

MASTER_HOSTNAME=redvm001

declare -a DATA_DIRECTORY=(/opt/gp/data/primary) # segements per host

注意这里安装时，由于我们每台服务器节点仅有一个物理CPU，因此这里DATA_DIRECTORY只有一个目录。如果每台服务器节点有2个物理CPU，则DATA_DIRECTORY建议配置如下：

declare -a DATA_DIRECTORY=(/opt/gp/data/primary /opt/gp/data/primary)

segments数据目录可以相同，GreenPlum会分配唯一后缀名，通常建议配置segments数据目录的个数和物理CPU个数相同。

4）执行初始化

$ gpinitsystem -m 25 -c gpinitsystem_config -h hostfile_gpinitsystem

$ vim hostfile_gpinitsystem # only segments host

redvm001

redvm002

redvm003

:wq

5）添加环境变量

$ vim .bash_profile

source /opt/gp/greenplum-db/greenplum_path.sh

export MASTER_DATA_DIRECTORY=/opt/gp/master/gpseg-1

:wq

$ source .bash_profile

2.1.5. 检查GreenPlum集群

1）启动集群

gpstart

2）关闭集群

gpstop

3）集群状态

gpstate

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

数据库迁移-国产化-迁移建议-GreenPlum DB向GBase 8a 迁移

1. 迁移建议

1.1. 连接操作符

1.2. 异或操作符

1.3. EXCEPT/MINUS函数

1.4. 正则表达式

1.5. 临时表

1.6. 时间操作

1.7. 复杂查询语句

1.7.1. 使用 DISTINCT

1.7.2. 使用子查询

1.7.3. 使用GROUP BY

1.7.4. 使用HAVING 子句

1.7.5. 使用 ROLLUP

1.7.6. 外连接

1.8. 带子查询的更新

2. 迁移示例

2.1. 迁移准备

2.1.1. GreenPlum集群部署概览

2.1.2. 系统环境准备

2.1.3. GreenPlum集群安装

2.1.4. 初始化GreenPlum集群

2.1.5. 检查GreenPlum集群

评论