1. 迁移建议
下面介绍一下迁移过程常用的一些迁移方法。
1.1. 连接操作符
GreenPlum中连接符“||”表示字符串连接,没有concat函数,而在GBase中连接符“||”既可以表示连接也可以表示或。当GBase系统变量sql_mode打开PIPES_AS_CONCAT时,“||”表示连接,和CONCAT函数等价,而当GBase系统变量sql_mode关闭PIPES_AS_CONCAT时,“||”表示或。
注意GBase在默认安装时会打开PIPES_AS_CONCAT参数,因此在应用系统迁移时,原有的连接符“||”可以不进行修改。如果想确认系统变量是否正确可以使用如下命令查看。
gbase> show variables like '%sql_mode%';
1.2. 异或操作符
GreenPlum中”#”表示异或操作符,”^”表示power函数,而在GBase中”#”表示注释,^表示异或。
GreenPlum中使用操作符”#”表示异或,GBase中表示注释
数据库 | GreenPlum | GBase |
语句 | select 1 # 1; | select 1 # 1 awgege fegegew ; |
结果 | 0 | 1 |
GreenPlum中使用操作符”^”表示power函数,GBase中表示异或
数据库 | GreenPlum | GBase |
语句 | select 2 ^ 4; | select 2 ^ 4; |
结果 | 16 | 6 |
1.3. EXCEPT/MINUS函数
GreenPlum中EXCEPT函数和GBase中MINUS函数等价,表示求两个查询结果的差集,GBase不支持EXCEPT关键字,如果使用会报语法错误。因此,在应用系统迁移时,需要将所有的EXCEPT改成MINUS。
数据库 | GreenPlum | GBase |
语句 | create table t1(a int); insert into t1 values(1); create table t2(a int); insert into t1 values(2); select * from t1 except select * from t2; | create table t1(a int); insert into t1 values(1); create table t2(a int); insert into t1 values(2); select * from t1 minus select * from t2; |
结果 | 1 | 1 |
1.4. 正则表达式
GreenPlum中正则表达式的支持非常灵活,支持正则表达式操作符,而在GBase中是不支持的,但是所有的这些操作符都可以使用GBase提供的REGEXP进行替换。此外,GreenPlum和GBase均实现了常见的正则表达式函数,例如regexp_replace函数,可以将模式匹配的字符串替换成指定的函数。
数据库 | GreenPlum | GBase |
语句 | drop table if exists t1; create table t1(a varchar(10)); insert into t1 values(‘abc’),(’bcd’),(’cde’); select * from t1 where a ~ ‘^(a|b)’; | drop table if exists t1; create table t1(a varchar(10)); insert into t1 values(‘abc’),(’bcd’),(’cde’); select * from t1 where a regexp ‘^(a|b)’; |
结果 | abc bcd | abc bcd |
数据库 | GreenPlum | GBase |
语句 | drop table if exists t1; create table t1(a varchar(10)); insert into t1 values(‘abc’),(’abcabc’); select regexp_replace(a, ‘a.c’,’A’) from t1; | drop table if exists t1; create table t1(a varchar(10)); insert into t1 values(‘abc’),(’abcabc’); select regexp_replace(a, ‘a.c’,’A’) from t1; |
结果 | A Aabc | A AA |
注意GreenPlum中regexp_replace函数和GBase中的用法存在差异。GBase中默认是替换所有匹配到的字符串,而GreenPlum中只替换第一次匹配到的,如果想替换所有匹配的字符串,则需要额外提供参数’g’。因此在上述示例中如果将GreenPlum中的查询语句写为select regexp_replace(a, ‘a.c’,’A’,’g’) from t1;则查询结果就和GBase保持一致了。
1.5. 临时表
在GREENPLUM中,可以创建以下两种临时表:
1) 会话特有的临时表
CREATE TEMPORARY TABLE ON COMMIT PRESERVE ROWS;
2) 事务特有的临时表
CREATE TEMPORARY TABLE ON COMMIT DELETE ROWS;
例如:
创建会话特有的表
create temporary table t1(a int) on commit preserve rows;
创建事务特有的表
create temporary table t2(a int) on commit delete rows;
--ON COMMIT DELETE ROWS 说明临时表是事务指定,每次事务提交后将截断表(删除全部行)
--ON COMMIT PRESERVE ROWS 说明临时表是会话指定,当事务提交后,表中的数据不会截断,依然会保留。需要说明的是,因为所有的临时表在会话结束时都会被删除,因此在这里将ON COMMIT PRESERVE ROWS建立的表称之为会话特有的临时表。
GBase也有临时表。但是临时表被限制在当前连接中,当连接关闭时,临时表会自动地删除。这就意味着,两个不同的连接可以使用同一个临时表名而不会发生冲突,也不会与同名现有的表冲突(现有表将被隐藏,直到临时表被删除)。
CREATE TEMPORARY TABLE temp1
(id int auto_increment primary key, ClassifiedID int);
<。。。。。。>
DROP TEMPORARY TABLE temp1 ;
1.6. 时间操作
GreenPlum时间可以使用算术操作符进行操作,GBase可以使用 ADDDATE 函数,示例见下表:
GreenPlum | GBase |
date '2001-09-28' + integer '7' | adddate('2001-09-28', interval 7 day) |
date '2001-09-28' - integer '7' | adddate('2001-09-28', interval -7 day) |
timestamp '2001-09-28 01:00' + interval '23 hours' | adddate('2001-09-28 01:00', interval 23 hour) |
timestamp '2001-09-28 01:00' - interval '23 hours' | adddate('2001-09-28 01:00', interval -23 hour) |
1.7. 复杂查询语句
在这一节中,介绍一下复杂查询语句的用法和迁移对比。
1.7.1. 使用 DISTINCT
DISTINCT 可以防止检索出重复的记录。
示例:
数据库 | GREENPLUM | GBase |
语句 | drop table if exists locations; Create table locations(country_id varchar(10)); Insert into locations values('AU'); Insert into locations values('AU'); Insert into locations values('BR'); Insert into locations values('CH'); Insert into locations values('CA'); Insert into locations values('CA'); select distinct country_id from locations; | drop table if exists locations; Create table locations(country_id varchar(10)); Insert into locations values('AU'); Insert into locations values('AU'); Insert into locations values('BR'); Insert into locations values('CH'); Insert into locations values('CA'); Insert into locations values('CA'); select distinct country_id from locations; |
结果 | AU BR CA CH | AU BR CA CH |
1.7.2. 使用子查询
GBase与GreenPlum均在 select 中支持子查询,一般不需要修改。
示例:
数据库 | GREENPLUM | GBase |
语句 | drop table if exists employees; drop table if exists departments; Create table employees(last_name varchar(10), department_id int); Create table departments (department_id int, department_name varchar(10)); Insert into employees values('King', 1); Insert into employees values('Kochhar', 1); Insert into employees values('De Haan', 1); Insert into employees values('Haha', 2); Insert into departments values(1, 'Executive'); Insert into departments values(2, 'Normal'); select last_name from employees where department_id in ( select department_id from departments where department_name = 'Executive' ); | drop table if exists employees; drop table if exists departments; Create table employees(last_name varchar(10), department_id int); Create table departments (department_id int, department_name varchar(10)); Insert into employees values('King', 1); Insert into employees values('Kochhar', 1); Insert into employees values('De Haan', 1); Insert into employees values('Haha', 2); Insert into departments values(1, 'Executive'); Insert into departments values(2, 'Normal'); select last_name from employees where department_id in ( select department_id from departments where department_name = 'Executive' ); |
结果 | King Kochhar De Haan | King Kochhar De Haan |
1.7.3. 使用GROUP BY
GBase 的 GROUP BY 用法和 GREENPLUM的很相似,除了不支持的函数外基本上迁移时不用修改。
示例:
数据库 | GREENPLUM | GBase |
语句 | drop table if exists employees; Create table employees(employee_id int, department_id int); Insert into employees values(0, NULL); Insert into employees values(1, 10); Insert into employees values(2, 20); Insert into employees values(3, 20); select count(employee_id),department_id from employees group by department_id order by department_id; | drop table if exists employees; Create table employees(employee_id int, department_id int); Insert into employees values(0, NULL); Insert into employees values(1, 10); Insert into employees values(2, 20); Insert into employees values(3, 20); select count(employee_id),department_id from employees group by department_id order by department_id; |
结果 | 1 NULL 1 10 2 20 | 1 NULL 1 10 2 20 |
1.7.4. 使用HAVING 子句
GREENPLUM支持在 GROUPBY 中使用 HAVING 子句,用法和 GBase一样。
示例:
数据库 | GREENPLUM | GBase |
语句 | drop table if exists employees; Create table employees(employee_id int, department_id int); Insert into employees values(10, 30); Insert into employees values(11, 30); Insert into employees values(21, 40); Insert into employees values(1, 10); Insert into employees values(2, 20); Insert into employees values(3, 20); select count(employee_id),department_id from employees group by department_id having count(employee_id) >=2 order by department_id; | drop table if exists employees; Create table employees(employee_id int, department_id int); Insert into employees values(10, 30); Insert into employees values(11, 30); Insert into employees values(21, 40); Insert into employees values(1, 10); Insert into employees values(2, 20); Insert into employees values(3, 20); select count(employee_id),department_id from employees group by department_id having count(employee_id) >=2 order by department_id; |
结果 | 2 30 2 20 | 2 30 2 20 |
1.7.5. 使用 ROLLUP
GROUP BY 子句允许使用 WITH ROLLUP 修饰语将额外的行加到最后的输出中。这些行代表更高层的(或高聚集)求和操作。
示例:
数据库 | GREENPLUM | GBase |
语句 | drop table if exists employees; Create table employees(salary int, department_id int); Insert into employees values(13, 1); Insert into employees values(34, 2); Insert into employees values(51, 2); Insert into employees values(36, 3); Insert into employees values(27, 4); Insert into employees values(40, 5); select department_id,sum(salary) from employees where department_id is not null group by rollup(department_id); | drop table if exists employees; Create table employees(salary int, department_id int); Insert into employees values(13, 1); Insert into employees values(34, 2); Insert into employees values(51, 2); Insert into employees values(36, 3); Insert into employees values(27, 4); Insert into employees values(40, 5); select department_id,sum(salary) from employees where department_id is not null group by rollup(department_id); |
结果 | 1 13 2 85 3 36 4 27 5 40 Total 201 | 1 13 2 85 3 36 4 27 5 40 Total 201 |
1.7.6. 外连接
GBase 支持外连接,且可以兼容 GREENPLUM 的格式,在迁移时一般不用修改。
示例:
数据库 | GREENPLUM | GBase |
语句 | drop table if exists countries; drop table if exists regions; Create table countries(country_name varchar(10), region_id int); Create table regions(region_name varchar(10), region_id int); Insert into countries values('Argentina', 2); Insert into countries values('Canada', 2); Insert into countries values('Korea', 1); Insert into countries values('Japan', 1); Insert into regions values('Asia', 1); Insert into regions values('Americas', 2); Insert into regions values('Africa', 3); select country_name, regions.region_name from countries left join regions on countries.region_id = regions.region_id and countries.region_id =2; | drop table if exists countries; drop table if exists regions; Create table countries(country_name varchar(10), region_id int); Create table regions(region_name varchar(10), region_id int); Insert into countries values('Argentina', 2); Insert into countries values('Canada', 2); Insert into countries values('Korea', 1); Insert into countries values('Japan', 1); Insert into regions values('Asia', 1); Insert into regions values('Americas', 2); Insert into regions values('Africa', 3); select country_name, regions.region_name from countries left join regions on countries.region_id = regions.region_id and countries.region_id =2; |
结果 | Argentina Americas Canada Americas Korea Japan | Argentina Americas Canada Americas Korea Japan |
1.8. 带子查询的更新
GBase 更新语句可以使用子查询语句,参见下面的示例:
示例:
数据库 | GREENPLUM | GBase |
建表 | Drop table if exists Employees; Create table employees(salary int, department_id int); Insert into employees values(30000, 100); Insert into employees values(21600, 100); Insert into employees values(50000, 101); Drop table if exists DeptSalary; create table DeptSalary (EmployeeID int, salary float(20)) ; | Drop table if exists Employees; Create table employees(salary int, department_id int); Insert into employees values(30000, 100); Insert into employees values(21600, 100); Insert into employees values(50000, 101); Drop table if exists DeptSalary; create table DeptSalary (EmployeeID int, salary float(20)) ; |
插值 | insert into DeptSalary values(100,0 ) ; | insert into DeptSalary values(100,0 ) ; |
更新 语句 | update DeptSalary set salary = ( select sum(salary) from Employees where department_id = 100); | update DeptSalary set salary = ( select sum(salary) from Employees where department_id = 100); |
检索 | select * from DeptSalary; | select * from DeptSalary; |
结果 | 100 51600 | 100 51600 |
2. 迁移示例
下面我们将迁移一个实际的 GreenPlum 实例到 GBase 中。
2.1. 迁移准备
首先需要安装 GreenPlum数据库和GBase 数据库。GBase数据库的安装请参考GBase集群安装手册,下面重点说明GreenPlum集群安装过程。如果没有特别说明,文档中#表示root用户,$表示gpadmin用户。
2.1.1. GreenPlum集群部署概览
服务器主机名 | IP | GreenPlum集群角色 |
redvm001 | 10.10.10.226 | master节点 segment host节点 |
redvm002 | 10.10.10.227 | segment host节点 |
redvm003 | 10.10.10.228 | segment host节点 |
我们安装的GreenPlum集群为3点集群环境,其中包含1个master节点和3个segment host节点,考虑到作为实验环境,master节点压力不会太大,因此为了充分利用资源,这里 redvm001服务器既承担master角色又承担segment host角色。
2.1.2. 系统环境准备
为了确保GreenPlum集群运行在最佳的状态,需要按照如下说明修改系统的环境。
1) 系统参数,对应文件位置/etc/sysctl.conf
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
vm.overcommit_memory = 2
2) 资源使用,对应文件位置/etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
3) 磁盘IO调度,GreenPlum建议使用deadline
例如:echo deadline > /sys/block/sbd/queue/scheduler,其中sdb为GreenPlum实际安装路径所在的磁盘
2.1.3. GreenPlum集群安装
1) 创建GreenPlum管理员账户
# groupadd -g 8000 gpadmin
# useradd -m -d /home/gpadmin -g gpadmin -u 8000 gpadmin
# passwd gpadmin
2) 创建GreenPlum安装目录(如果执行默认安装,这步可以忽略)
# mkdir -p /opt/gp/greenplum # GP install path
# chown -R gpadmin:gpadmin /opt/gp
3) 执行安装
# su – gpadmin
$ /bin/bash greenplum-db-4.3.10.0-build-1-rhel5-x86_64.bin
......
**********************************************************************
Do you accept the Pivotal Database license agreement? [yes|no]
**********************************************************************
yes
**********************************************************************
Provide the installation path for Greenplum Database or press ENTER to
accept the default installation path: /usr/local/greenplum-db-4.3.6.2
**********************************************************************
/opt/gp/greenplum # type in the GP install path
***********************************************************************
Install Greenplum Database into </opt/gp/greenplum>? [yes|no]
***********************************************************************
yes
Extracting product to /opt/gp/greenplum
***********************************************************************
Installation complete.
Greenplum Database is installed in /opt/gp/greenplum
Pivotal Greenplum documentation is available
for download at http://docs.gopivotal.com/gpdb
***********************************************************************
4) 检查hosts配置,在所有节点的/etc/hosts文件中增加如下内容。
10.10.10.226 redvm001
10.10.10.227 redvm002
10.10.10.228 redvm003
5) 用GreenPlum集群部落工具在所有节点安装GreenPlum数据库、创建GreenPlum管理员和建立互信
# source /opt/gp/greenplum/greenplum_path.sh
# gpseginstall -f /home/gpadmin/hostfile_exkeys -u gpadmin -p gpadmin
$ vim hostfile_exkeys # use gpadmin, all host
redvm001
redvm002
redvm003
:wq
2.1.4. 初始化GreenPlum集群
1) Master节点创建master数据目录
# mkdir /opt/gp/master
# chown gpadmin:gpadmin /opt/gp/master
2) Segment节点创建segements数据目录
# source /opt/gp/greenplum-db/greenplum_path.sh
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /opt/gp/data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin:gpadmin /opt/gp/data/primary'
# vim hostfile_gpssh_segonly #only segment host
redvm001
redvm002
redvm003
:wq
3) 修改配置文件
$ source /opt/gp/greenplum-db/greenplum_path.sh
$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config
$ chmod +w gpinitsystem_config
$ vim gpinitsystem_config
MASTER_DIRECTORY=/opt/gp/master
MASTER_HOSTNAME=redvm001
declare -a DATA_DIRECTORY=(/opt/gp/data/primary) # segements per host
注意这里安装时,由于我们每台服务器节点仅有一个物理CPU,因此这里DATA_DIRECTORY只有一个目录。如果每台服务器节点有2个物理CPU,则DATA_DIRECTORY建议配置如下:
declare -a DATA_DIRECTORY=(/opt/gp/data/primary /opt/gp/data/primary)
segments数据目录可以相同,GreenPlum会分配唯一后缀名,通常建议配置segments数据目录的个数和物理CPU个数相同。
4) 执行初始化
$ gpinitsystem -m 25 -c gpinitsystem_config -h hostfile_gpinitsystem
$ vim hostfile_gpinitsystem # only segments host
redvm001
redvm002
redvm003
:wq
5) 添加环境变量
$ vim .bash_profile
source /opt/gp/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/opt/gp/master/gpseg-1
:wq
$ source .bash_profile
2.1.5. 检查GreenPlum集群
1) 启动集群
gpstart
2) 关闭集群
gpstop
3) 集群状态
gpstate




