暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

数据库迁移-国产化-迁移建议-GreenPlum DB向GBase 8a 迁移

原创 manhuai 2022-09-02
900

1. 迁移建议

下面介绍一下迁移过程常用的一些迁移方法。

1.1. 连接操作符

GreenPlum中连接符“||”表示字符串连接,没有concat函数,而在GBase中连接符“||”既可以表示连接也可以表示或。当GBase系统变量sql_mode打开PIPES_AS_CONCAT时,“||”表示连接,和CONCAT函数等价,而当GBase系统变量sql_mode关闭PIPES_AS_CONCAT时,“||”表示或。

注意GBase在默认安装时会打开PIPES_AS_CONCAT参数,因此在应用系统迁移时,原有的连接符“||”可以不进行修改。如果想确认系统变量是否正确可以使用如下命令查看。

gbase> show variables like '%sql_mode%';

1.2. 或操作符

GreenPlum中”#”表示异或操作符,”^”表示power函数,而在GBase中”#”表示注释,^表示异或。

GreenPlum中使用操作符”#”表示异或,GBase中表示注释

数据库

GreenPlum

GBase

语句

select 1 # 1;

select 1 # 1 awgege fegegew

;

结果

0

1

 

GreenPlum中使用操作符”^”表示power函数,GBase中表示异或

数据库

GreenPlum

GBase

语句

select 2 ^ 4;

select 2 ^ 4;

结果

16

6

 

1.3. EXCEPT/MINUS函数

GreenPlum中EXCEPT函数和GBase中MINUS函数等价,表示求两个查询结果的差集,GBase不支持EXCEPT关键字,如果使用会报语法错误。因此,在应用系统迁移时,需要将所有的EXCEPT改成MINUS。

数据库

GreenPlum

GBase

语句

create table t1(a int);

insert into t1 values(1);

create table t2(a int);

insert into t1 values(2);

select * from t1 except select * from t2;

create table t1(a int);

insert into t1 values(1);

create table t2(a int);

insert into t1 values(2);

select * from t1 minus select * from t2;

结果

1

1

 

1.4. 正则表达式

GreenPlum中正则表达式的支持非常灵活,支持正则表达式操作符,而在GBase中是不支持的,但是所有的这些操作符都可以使用GBase提供的REGEXP进行替换。此外,GreenPlum和GBase均实现了常见的正则表达式函数,例如regexp_replace函数,可以将模式匹配的字符串替换成指定的函数。

数据库

GreenPlum

GBase

语句

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’bcd’),(’cde’);

select * from t1 where a ~ ‘^(a|b)’;

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’bcd’),(’cde’);

select * from t1 where a regexp ‘^(a|b)’;

结果

abc

bcd

abc

bcd

 

数据库

GreenPlum

GBase

语句

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’abcabc’);

select regexp_replace(a, ‘a.c’,’A’) from t1;

drop table if exists t1;

create table t1(a varchar(10));

insert into t1 values(‘abc’),(’abcabc’);

select regexp_replace(a, ‘a.c’,’A’) from t1;

结果

A

Aabc

A

AA

注意GreenPlum中regexp_replace函数和GBase中的用法存在差异。GBase中默认是替换所有匹配到的字符串,而GreenPlum中只替换第一次匹配到的,如果想替换所有匹配的字符串,则需要额外提供参数’g’。因此在上述示例中如果将GreenPlum中的查询语句写为select regexp_replace(a, ‘a.c’,’A’,’g’) from t1;则查询结果就和GBase保持一致了。

1.5. 临时表

在GREENPLUM中,可以创建以下两种临时表:

1) 会话特有的临时表 
CREATE TEMPORARY TABLE ON COMMIT PRESERVE ROWS;

2) 事务特有的临时表 
CREATE TEMPORARY TABLE ON COMMIT DELETE ROWS;

例如:

创建会话特有的表

create temporary table t1(a int) on commit preserve rows;

创建事务特有的表

create temporary table t2(a int) on commit delete rows;

--ON COMMIT DELETE ROWS 说明临时表是事务指定,每次事务提交后将截断表(删除全部行)

--ON COMMIT PRESERVE ROWS 说明临时表是会话指定,当事务提交后,表中的数据不会截断,依然会保留。需要说明的是,因为所有的临时表在会话结束时都会被删除,因此在这里将ON COMMIT PRESERVE ROWS建立的表称之为会话特有的临时表。

GBase也有临时表。但是临时表被限制在当前连接中,当连接关闭时,临时表会自动地删除。这就意味着,两个不同的连接可以使用同一个临时表名而不会发生冲突,也不会与同名现有的表冲突(现有表将被隐藏,直到临时表被删除)。

CREATE TEMPORARY TABLE temp1

(id int auto_increment primary key, ClassifiedID int);

<。。。。。。>

DROP TEMPORARY TABLE temp1 ;

1.6. 时间操作

GreenPlum时间可以使用算术操作符进行操作,GBase可以使用 ADDDATE 函数,示例见下表:

GreenPlum

GBase

date '2001-09-28' + integer '7'

adddate('2001-09-28', interval 7 day)

date '2001-09-28' - integer '7'

adddate('2001-09-28', interval -7 day)

timestamp '2001-09-28 01:00' + interval '23 hours'

adddate('2001-09-28 01:00', interval 23 hour)

timestamp '2001-09-28 01:00' - interval '23 hours'

adddate('2001-09-28 01:00', interval -23 hour)

 

1.7. 复杂查询语句

在这一节中,介绍一下复杂查询语句的用法和迁移对比。

1.7.1. 使用 DISTINCT 

DISTINCT 可以防止检索出重复的记录。

示例:

数据库

GREENPLUM

GBase

语句

drop table if exists locations;

Create table locations(country_id varchar(10));

Insert into locations values('AU');

Insert into locations values('AU');

Insert into locations values('BR');

Insert into locations values('CH');

Insert into locations values('CA');

Insert into locations values('CA');

 

select distinct country_id from locations;

drop table if exists locations;

Create table locations(country_id varchar(10));

Insert into locations values('AU');

Insert into locations values('AU');

Insert into locations values('BR');

Insert into locations values('CH');

Insert into locations values('CA');

Insert into locations values('CA');

 

select distinct country_id from locations;

结果

AU

 BR

 CA

 CH

 

AU

 BR

 CA

 CH

 

1.7.2. 使用子查询

GBase与GreenPlum均在 select 中支持子查询,一般不需要修改。

示例:

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

drop table if exists departments;

Create table employees(last_name varchar(10), department_id int);

Create table departments (department_id int, department_name varchar(10));

 

Insert into employees values('King', 1);

Insert into employees values('Kochhar', 1);

Insert into employees values('De Haan', 1);

Insert into employees values('Haha', 2);

Insert into departments  values(1, 'Executive');

Insert into departments  values(2, 'Normal');

 

select last_name from employees where department_id in (

select department_id from departments where department_name = 'Executive' );

drop table if exists employees;

drop table if exists departments;

Create table employees(last_name varchar(10), department_id int);

Create table departments (department_id int, department_name varchar(10));

 

Insert into employees values('King', 1);

Insert into employees values('Kochhar', 1);

Insert into employees values('De Haan', 1);

Insert into employees values('Haha', 2);

Insert into departments  values(1, 'Executive');

Insert into departments  values(2, 'Normal');

 

select last_name from employees where department_id in (

select department_id from departments where department_name = 'Executive' );

结果

King

 Kochhar

 De Haan

King

 Kochhar

 De Haan

1.7.3. 使用GROUP BY

GBase 的 GROUP BY 用法和 GREENPLUM的很相似,除了不支持的函数外基本上迁移时不用修改。

示例:

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

Create table employees(employee_id int, department_id int);

 

Insert into employees values(0, NULL);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

 

select count(employee_id),department_id from employees group by department_id order by department_id;

drop table if exists employees;

Create table employees(employee_id int, department_id int);

 

Insert into employees values(0, NULL);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

 

select count(employee_id),department_id from employees group by department_id order by department_id;

结果

 1         NULL

 1         10

 2         20

 

 1         NULL

 1         10

 2         20

 

1.7.4. 使用HAVING 子句

GREENPLUM支持在 GROUPBY 中使用 HAVING 子句,用法和 GBase一样。

示例:

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

Create table employees(employee_id int, department_id int);

 

Insert into employees values(10, 30);

Insert into employees values(11, 30);

Insert into employees values(21, 40);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

 

select count(employee_id),department_id from employees group by department_id having count(employee_id) >=2 order by department_id;

drop table if exists employees;

Create table employees(employee_id int, department_id int);

 

Insert into employees values(10, 30);

Insert into employees values(11, 30);

Insert into employees values(21, 40);

Insert into employees values(1, 10);

Insert into employees values(2, 20);

Insert into employees values(3, 20);

 

select count(employee_id),department_id from employees group by department_id having count(employee_id) >=2 order by department_id;

结果

  2            30

  2            20

 

  2            30

  2            20

 

1.7.5. 使用 ROLLUP

GROUP BY 子句允许使用 WITH ROLLUP 修饰语将额外的行加到最后的输出中。这些行代表更高层的(或高聚集)求和操作。

示例:

数据库

GREENPLUM

GBase

语句

drop table if exists employees;

Create table employees(salary int, department_id int);

 

Insert into employees values(13, 1);

Insert into employees values(34, 2);

Insert into employees values(51, 2);

Insert into employees values(36, 3);

Insert into employees values(27, 4);

Insert into employees values(40, 5);

 

select department_id,sum(salary) from employees where department_id is not null group by rollup(department_id);

drop table if exists employees;

Create table employees(salary int, department_id int);

 

Insert into employees values(13, 1);

Insert into employees values(34, 2);

Insert into employees values(51, 2);

Insert into employees values(36, 3);

Insert into employees values(27, 4);

Insert into employees values(40, 5);

 

select department_id,sum(salary) from employees where department_id is not null group by rollup(department_id);

结果

1                   13

2                   85

3                   36

4                   27

5                   40

Total               201

 

1                   13

2                   85

3                   36

4                   27

5                   40

Total               201

 

1.7.6. 外连接

GBase 支持外连接,且可以兼容 GREENPLUM 的格式,在迁移时一般不用修改。

示例:

数据库

GREENPLUM

GBase

语句

drop table if exists countries;

drop table if exists regions;

 

Create table countries(country_name varchar(10), region_id int);

Create table regions(region_name varchar(10), region_id int);

 

Insert into countries values('Argentina', 2);

Insert into countries values('Canada', 2);

Insert into countries values('Korea', 1);

Insert into countries values('Japan', 1);

Insert into regions values('Asia', 1);

Insert into regions values('Americas', 2);

Insert into regions values('Africa', 3);

 

select country_name, regions.region_name from countries left join regions on countries.region_id = regions.region_id and countries.region_id =2;

 

drop table if exists countries;

drop table if exists regions;

 

Create table countries(country_name varchar(10), region_id int);

Create table regions(region_name varchar(10), region_id int);

 

Insert into countries values('Argentina', 2);

Insert into countries values('Canada', 2);

Insert into countries values('Korea', 1);

Insert into countries values('Japan', 1);

Insert into regions values('Asia', 1);

Insert into regions values('Americas', 2);

Insert into regions values('Africa', 3);

 

select country_name, regions.region_name from countries left join regions on countries.region_id = regions.region_id and countries.region_id =2;

 

结果

Argentina                   Americas

Canada                     Americas

Korea

Japan

Argentina                  Americas

Canada                    Americas

Korea

 Japan

 

1.8. 带子查询的更新

GBase 更新语句可以使用子查询语句,参见下面的示例:

示例:

数据库

GREENPLUM

GBase

建表

Drop table if exists Employees;

Create table employees(salary  int, department_id int);

Insert into employees values(30000, 100);

Insert into employees values(21600, 100);

Insert into employees values(50000, 101);

Drop table if exists DeptSalary;

create table DeptSalary (EmployeeID int, salary  float(20)) ;

Drop table if exists Employees;

Create table employees(salary  int, department_id int);

Insert into employees values(30000, 100);

Insert into employees values(21600, 100);

Insert into employees values(50000, 101);

Drop table if exists DeptSalary;

create table DeptSalary (EmployeeID int, salary  float(20)) ;

插值

insert into DeptSalary values(100,0 ) ;

insert into DeptSalary values(100,0 ) ;

更新

语句

update DeptSalary set salary = (

select sum(salary) from Employees where department_id = 100);

update DeptSalary set salary = (

select sum(salary) from Employees where department_id = 100);

检索

select * from DeptSalary;

select * from DeptSalary;

结果

100    51600

100    51600

 

2. 迁移示例

下面我们将迁移一个实际的 GreenPlum 实例到 GBase 中。

2.1. 迁移准备

首先需要安装 GreenPlum数据库和GBase 数据库。GBase数据库的安装请参考GBase集群安装手册,下面重点说明GreenPlum集群安装过程。如果没有特别说明,文档中#表示root用户,$表示gpadmin用户。

2.1.1. GreenPlum集群部署概览

服务器主机名

IP

GreenPlum集群角色

redvm001

10.10.10.226

master节点

segment host节点

redvm002

10.10.10.227

segment host节点

redvm003

10.10.10.228

segment host节点

我们安装的GreenPlum集群为3点集群环境,其中包含1个master节点和3个segment host节点,考虑到作为实验环境,master节点压力不会太大,因此为了充分利用资源,这里 redvm001服务器既承担master角色又承担segment host角色。

2.1.2. 系统环境准备

为了确保GreenPlum集群运行在最佳的状态,需要按照如下说明修改系统的环境。

1) 系统参数,对应文件位置/etc/sysctl.conf

kernel.shmmax = 500000000

kernel.shmmni = 4096

kernel.shmall = 4000000000

kernel.sem = 250 512000 100 2048

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.ip_forward = 0

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.ipv4.ip_local_port_range = 1025 65535

net.core.netdev_max_backlog = 10000

vm.overcommit_memory = 2

2) 资源使用,对应文件位置/etc/security/limits.conf

* soft nofile 65536

* hard nofile 65536

* soft nproc 131072

* hard nproc 131072

3) 磁盘IO调度,GreenPlum建议使用deadline

例如:echo deadline > /sys/block/sbd/queue/scheduler,其中sdb为GreenPlum实际安装路径所在的磁盘

2.1.3. GreenPlum集群安装

1) 创建GreenPlum管理员账户

# groupadd -g 8000 gpadmin

# useradd -m -d /home/gpadmin -g gpadmin -u 8000 gpadmin

# passwd gpadmin

2) 创建GreenPlum安装目录(如果执行默认安装,这步可以忽略)

# mkdir -p /opt/gp/greenplum          # GP install path

# chown -R gpadmin:gpadmin /opt/gp

3) 执行安装

# su – gpadmin

$ /bin/bash greenplum-db-4.3.10.0-build-1-rhel5-x86_64.bin

......

**********************************************************************

Do you accept the Pivotal Database license agreement? [yes|no]

**********************************************************************

 

yes

 

**********************************************************************

Provide the installation path for Greenplum Database or press ENTER to

accept the default installation path: /usr/local/greenplum-db-4.3.6.2

**********************************************************************

 

/opt/gp/greenplum                # type in the GP install path

 

***********************************************************************

Install Greenplum Database into </opt/gp/greenplum>? [yes|no]

***********************************************************************

 

yes

 

Extracting product to /opt/gp/greenplum

 

***********************************************************************

Installation complete.

Greenplum Database is installed in /opt/gp/greenplum

 

Pivotal Greenplum documentation is available

for download at http://docs.gopivotal.com/gpdb

***********************************************************************

4) 检查hosts配置,在所有节点的/etc/hosts文件中增加如下内容。

10.10.10.226   redvm001

10.10.10.227   redvm002

10.10.10.228   redvm003

5) 用GreenPlum集群部落工具在所有节点安装GreenPlum数据库、创建GreenPlum管理员和建立互信

# source /opt/gp/greenplum/greenplum_path.sh

# gpseginstall -f /home/gpadmin/hostfile_exkeys -u gpadmin -p gpadmin

 

$ vim hostfile_exkeys     # use gpadmin, all host

redvm001

redvm002

redvm003

:wq

2.1.4. 初始化GreenPlum集群

1) Master节点创建master数据目录

# mkdir /opt/gp/master

# chown gpadmin:gpadmin /opt/gp/master

2) Segment节点创建segements数据目录

# source /opt/gp/greenplum-db/greenplum_path.sh

# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /opt/gp/data/primary'

# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin:gpadmin /opt/gp/data/primary'

 

# vim hostfile_gpssh_segonly    #only segment host

redvm001

redvm002

redvm003

:wq

3) 修改配置文件

$ source /opt/gp/greenplum-db/greenplum_path.sh

$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config

$ chmod +w gpinitsystem_config

$ vim gpinitsystem_config

MASTER_DIRECTORY=/opt/gp/master

MASTER_HOSTNAME=redvm001               

declare -a DATA_DIRECTORY=(/opt/gp/data/primary)   # segements per host

注意这里安装时,由于我们每台服务器节点仅有一个物理CPU,因此这里DATA_DIRECTORY只有一个目录。如果每台服务器节点有2个物理CPU,则DATA_DIRECTORY建议配置如下:

 declare -a DATA_DIRECTORY=(/opt/gp/data/primary /opt/gp/data/primary)

segments数据目录可以相同,GreenPlum会分配唯一后缀名,通常建议配置segments数据目录的个数和物理CPU个数相同。

4) 执行初始化

$ gpinitsystem -m 25 -c gpinitsystem_config -h hostfile_gpinitsystem

$ vim hostfile_gpinitsystem    # only segments host

redvm001

redvm002

redvm003

:wq

5) 添加环境变量

$ vim .bash_profile

source /opt/gp/greenplum-db/greenplum_path.sh

export MASTER_DATA_DIRECTORY=/opt/gp/master/gpseg-1

:wq

$ source .bash_profile

2.1.5. 检查GreenPlum集群

1) 启动集群

gpstart

2) 关闭集群

gpstop

3) 集群状态

gpstate

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论