一、背景
二、数据迁移
- dumpling与yasldr数据迁移
- sqluldr 与yasldr 数据迁移
- kettle批量迁移数据到yashandb
三、心得感悟
一、背景
最近国产数据库Yashandb个人版发布,引来数据库的爱好者下载学习。墨天伦的论坛关于这个数据库的帖子十分多,都说oracle常用功能都可以在Yashandb数据库上找到,有的功能更是推陈出新。Yashandb个人版已经发布一段时间了,异构数据库数据库迁入Yashandb文章比较少。这篇文章梳理了oracle、mysql数据库迁移到Yashandb的方法,希望能够帮助同学更好的学习和使用Yashandb。(听说官网有YMP一键数据迁移平台,人名币玩家可以联系小助手。)
二、数据迁移
mysql 到yashandb表结构 映射(仅供参考)
mysql | yashan | 备注 |
bigint | bigint | |
int | int | |
smallint | int | |
tinyint | tinyint | |
DOUBLE | DOUBLE | |
FLOAT | FLOAT | |
NUMBER | DECIMAL | |
CHAR | NCHAR | |
VARCHAR | VARCHAR | |
datetime | timestamp | |
date | date | |
text\mediumtext\longtext | clob | |
mediumblob\longblob | blob、clob | 针对目标端的类型,如果是图片、音频等二进制文件,在yashandb建议用blob; 如果存储是文本,建议使用clob |
enum | char | |
set | char | |
year(4) | NUMBER(4) | MySQL 8.0.19开始,不推荐使用 |
varbinary | RAW |
使用mysql_random_load_data 模拟一张10,000,000的大表
mysql表结构
CREATE DATABASE IF NOT EXISTS test;
CREATE TABLE `test`.`t3` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tcol01` tinyint(4) DEFAULT NULL,
`tcol02` smallint(6) DEFAULT NULL,
`tcol03` mediumint(9) DEFAULT NULL,
`tcol04` int(11) DEFAULT NULL,
`tcol05` bigint(20) DEFAULT NULL,
`tcol06` float DEFAULT NULL,
`tcol07` double DEFAULT NULL,
`tcol08` decimal(10,2) DEFAULT NULL,
`tcol09` date DEFAULT NULL,
`tcol10` datetime DEFAULT NULL,
`tcol11` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`tcol12` time DEFAULT NULL,
`tcol13` year(4) DEFAULT NULL,
`tcol14` varchar(100) DEFAULT NULL,
`tcol15` char(2) DEFAULT NULL,
`tcol16` blob,
`tcol17` text,
`tcol18` mediumtext,
`tcol19` mediumblob,
`tcol20` longblob,
`tcol21` longtext,
`tcol22` mediumtext,
`tcol23` varchar(3) DEFAULT NULL,
`tcol24` varbinary(10) DEFAULT NULL,
`tcol25` enum('a','b','c') DEFAULT NULL,
`tcol26` set('red','green','blue') DEFAULT NULL,
`tcol27` float(5,3) DEFAULT NULL,
`tcol28` double(4,2) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
转换后,yashan表结构
create sequence test.seq_t3 start with 1 nocycle cache 200;
CREATE TABLE "TEST"."t3" (
"id" int(11) default test.seq_t3.nextval ,
"tcol01" tinyint(4) DEFAULT NULL,
"tcol02" int DEFAULT NULL,
"tcol03" int DEFAULT NULL,
"tcol04" int(11) DEFAULT NULL,
"tcol05" bigint(20) DEFAULT NULL,
"tcol06" float DEFAULT NULL,
"tcol07" double DEFAULT NULL,
"tcol08" number(10,2) DEFAULT NULL,
"tcol09" date DEFAULT NULL,
"tcol10" timestamp DEFAULT NULL,
"tcol11" timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ,
"tcol12" time DEFAULT NULL,
"tcol13" NUMBER(4) DEFAULT NULL,
"tcol14" varchar(100 char) DEFAULT NULL,
"tcol15" char(2 char) DEFAULT NULL,
"tcol16" CLOB,
"tcol17" CLOB,
"tcol18" CLOB,
"tcol19" CLOB,
"tcol20" CLOB,
"tcol21" CLOB,
"tcol22" CLOB,
"tcol23" varchar(3 char) DEFAULT NULL,
"tcol24" varchar(10 char) DEFAULT NULL,
"tcol25" char(1 char) DEFAULT NULL,
"tcol26" varchar(5 char) DEFAULT NULL,
"tcol27" float(5,3) DEFAULT NULL,
"tcol28" double(4,2) DEFAULT NULL,
PRIMARY KEY ("id")
) ;
create or replace trigger t3_auto_update_time
before
update on "TEST"."t3"
for each row
begin
select sysdate into :NEW."tcol11" from dual;
end;
dumpling 与yasldr 数据迁移
dumpling是tidb的一款高性能导出工具,能够实现mysql数据的快速导出,导出数据格式可以是CSV或insert语句。可用于mysql与异构数据库间数据迁移。
mysql dumpling 导出 mysql
./dumpling -uroot -P 3306 -h 127.0.0.1 -t 8 \
--no-header -m -o /home/mysql/dumping_export_data \
--filetype csv --csv-delimiter '"' --csv-null-value $'' \
--csv-line-terminator $'\n' --output-filename-template '{{.DB}}.{{.Table}}' \
--filter test.t3
yasldr
生成ctl文件的列
yasql ${URL} \
-c "SELECT '('||wm_concat(column_name)||')' FROM (select column_name from dba_tab_columns WHERE owner='${tableSchema}' AND table_name='${tableName}' ORDER BY column_id)"
yasldr导数命令
yasldr sys/yasdb_123@10.9.36.24:1688 batch_size=4032 control_text="'LOAD DATA INFILE '/home/yashan/test.t3.csv' FIELDS TERMINATED BY ',' optionally enclosed by '\"' INTO TABLE TEST.T3(ID,TCOL01,TCOL02,TCOL03,TCOL04,TCOL05,TCOL06,TCOL07,TCOL08,TCOL09,TCOL10,TCOL11,TCOL12,TCOL13,TCOL14,TCOL15,TCOL16,TCOL17,TCOL18,TCOL19,TCOL20,TCOL21,TCOL22,TCOL23,TCOL24,TCOL25,TCOL26,TCOL27,TCOL28)'"
数据校验
$ yasql sys/yasdb_123@10.9.36.24:1688
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64
Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux
SQL> select count(1) from TEST.T3;
COUNT(1)
---------------------
1000000
1 row fetched.
批量生成ctl文件
[yashan@fw-centos ~]$ cat gen_ctl.sh DBURL=${1:-sys/yasdb_123@0.0.0.0:1688} TABLELIST=${2:-table.list} DATAPATH=${3:-/tmp/data} gen_ctl { local tableSchema=$(echo ${1}|awk -F. '{print $1}') local tableName=$(echo ${1}|awk -F. '{print $2}') local sourcePath=${DATAPATH}/${tableSchema} local loadControlFile=${1}.ctl local dataFile=$(echo ${sourcePath}/${tableName}|sed 's/\//\\\//g') local columnList=$(yasql ${DBURL} -c "SELECT '('||wm_concat(column_name)||')' FROM (select column_name from dba_tab_columns \ WHERE owner='${tableSchema}' AND table_name='${tableName}' ORDER BY column_id)" |sed "1,3 d"|head -1) cp load.template ${loadControlFile} sed -i "s/%DATAFILE%/${dataFile}/" ${loadControlFile} sed -i "s/%TABLENAME%/${tableSchema}.${tableName}${columnList}/" ${loadControlFile} } while read table do gen_ctl ${table} done < ${TABLELIST}
[yashan@fw-centos ~]$ cat load.template LOAD DATA(RUN_LEVEL=SPLIT,DEGREE_OF_PARALLELISM=2) INFILE '%DATAFILE%' with embedded FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' INTO TABLE %TABLENAME%
[yashan@fw-centos ~]$ cat table.list
TEST.T3
[yashan@fw-centos ~]$ ./gen_ctl.sh sys/yasdb_123@127.0.0.1:1688 table.list /home/yashan
yasboot工具导入崖山数据库
yasboot load -c yashandb -f / -sm PART --param "BATCH_SIZE=2048 CONN_POOL_SIZE=10"
yasboot导入
数据行数校验
[yashan@fw-centos ~]$ yasboot load -c yashandb -f /home/yashan/TEST.T3.ctl -sm PART --param "BATCH_SIZE=2048 CONN_POOL_SIZE=10"
type | uuid | name | hostid | index | status | return_code | progress | cost
---------------------------------------------------------------------------------------------------------------------------
task | 2338582294e2a9d0 | Loader | - | yashandb.SYS__PART_6568400df13e04ee | SUCCESS | 0 | 100 | 26
------+------------------+--------+--------+-------------------------------------+---------+-------------+----------+------
task completed, status: SUCCESS
split file path: /home/yashan
[yashan@fw-centos ~]$ yasql sys/yasdb_123
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64
Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux
SQL> select count(1) from test.t3;
COUNT(1)
---------------------
1000000
1 row fetched.
SQL>
Disconnected from:
从上述例子看出,dumpling导出的csv能通过yasldr 快速 导入yashandb。dumpling和yasldr都是针对大数据量设计的导出与导入工具,可以快速完成离线迁移数据的任务。
sqluldr2与yasldr 数据迁移
oracle到yashandb表结构数据类型基本一致,oracle的表结构可以比较平滑迁移到YashanDB,具体数据类型描述可以参考Yashandb的数据类型描述
在oracle上建表并通过存储过程插入500w的数据
oracle建表语句如下:
CREATE TABLE ORDER_TABLE
(
ORDER_ID NUMBER
,USER_CODE VARCHAR2(256)
,GOODS_NAME VARCHAR2(256)
,ORDER_DATE DATE
,CITY VARCHAR2(256)
,ORDER_NUM NUMBER
);
squldr2与yasldr 迁移数据
sqluldr2是能从oracle 快速导出数据为CSV格式的工具
./sqluldr2 test_user/yasdb_123@127.0.0.1:1521/orcl query=SYS.ORDER_TABLE QUOTe='"' file=SYS.ORDER_TABLE
0 rows exported at 2023-11-30 21:14:18, size 0 MB.
1000000 rows exported at 2023-11-30 21:14:19, size 64 MB.
2000000 rows exported at 2023-11-30 21:14:21, size 132 MB.
3000000 rows exported at 2023-11-30 21:14:23, size 200 MB.
4000000 rows exported at 2023-11-30 21:14:25, size 268 MB.
5000000 rows exported at 2023-11-30 21:14:26, size 336 MB.
output file SYS.ORDER_TABLE closed at 5000000 rows, size 338 MB.
[oracle@fw-centos ~]$ head SYS.ORDER_TABLE
1001,"UKAJ5F5C","零食大礼包C","2020-08-29 00:00:00","上海",5
1002,"JOVIKU63","零食大礼包E","2020-10-06 00:00:00","上海",9
1003,"AZDJ8AD8","零食大礼包E","2020-07-28 00:00:00","上海",6
1004,"VIZNIEIB","零食大礼包B","2020-09-07 00:00:00","深圳",8
1005,"FN8RY5W4","零食大礼包C","2020-05-01 00:00:00","北京",4
1006,"PJL9RK0E","零食大礼包A","2020-02-27 00:00:00","北京",1
ll -h
total 2.8G
-rw-r--r--. 1 oracle oinstall 300M Nov 30 20:23 ORDER_TABLE.csv
-rw-r--r--. 1 oracle oinstall 859 Nov 30 20:23 SYS.ORDER_TABLE_sqlldr.ctl
[yashan@fw-centos ~]$ yasboot load -c yashandb -f /home/yashan/SYS.ORDER_TABLE.ctl --param "BATCH_SIZE=2048 CONN_POOL_SIZE=10"
type | uuid | name | hostid | index | status | return_code | progress | cost
---------------------------------------------------------------------------------------------------------------------------
task | 5b1adba0abb7b9e5 | Loader | - | yashandb.SYS__PART_65688df08f33171b | SUCCESS | 0 | 100 | 11
------+------------------+--------+--------+-------------------------------------+---------+-------------+----------+------
task completed, status: SUCCESS
split file path: /home/yashan
数据行数校验
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64
Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux
SQL> select count(1) from ORDER_TABLE;
COUNT(1)
---------------------
5000000
1 row fetched.
SQL> select * from ORDER_TABLE where rownum < 3;
ORDER_ID USER_CODE GOODS_NAME ORDER_DATE CITY ORDER_NUM
----------- ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------- ---------------------------------------------------------------- -----------
1001 UKAJ5F5C 零食大礼包C 2020-08-29 上海 5
1002 JOVIKU63 零食大礼包E 2020-10-06 上海 9
2 rows fetched.
从上述例子看出,sqluldr2导出的csv能通过yasldr 快速导入yashandb。sqluldr2和yasldr都是针对大数据量设计的导出与导入工具,可以快速完成离线迁移数据的任务。
Kettle批量迁移数据到yashandb
kettle是一款国外开源的ETL工具,纯java编写,可以在windows、linux上运行。Kettle以数据库驱动的形式,实现yashandb与异构数据间互相转换。
gitee链接:https://gitee.com/xujiafu/kettle-script
1. 安装kettle软件及java驱动包、上传依赖的驱动包$ pwd
/home/kettle/data-integration/lib
[kettle@fw-centos lib]$ ll yashan* oj*
-rw-r--r--. 1 kettle kettle 4036257 5月 25 2023 ojdbc8.jar
-rw-r--r--. 1 kettle kettle 534715 11月 30 22:12 yashandb-jdbc-1.5.1.jar
2. 编辑simple-jnd的simple.jdbc文件DataInput 写源端的jdbc连接信息
yasOutput 写目标端jdbc连接信息
注:目标端默认会做truncate table操作
cat /home/kettle/data-integration/simple-jndi
DataInput/type=javax.sql.DataSource
DataInput/driver=oracle.jdbc.driver.OracleDriver
DataInput/url=jdbc:oracle:thin:@10.9.36.23:1521/orcl
DataInput/user=TEST_USER
DataInput/password=yasdb_123
yasOutput/type=javax.sql.DataSource
yasOuput/driver=com.yashandb.jdbc.Driver
yasOuput/url=jdbc:yasdb://10.9.36.24:1688/test
yasOuput/user=SYS
yasOuput/password=yasdb_123
3.填写需要迁移的表名[kettle@fw-centos my-script]$ cat Table.txt
Table
ORDER_TABLE
- kettle 执行日志
2023/12/01 00:32:00 - 源表数据取得.0 - linenr 4450000
2023/12/01 00:32:00 - 源表数据取得.0 - linenr 4500000
2023/12/01 00:32:01 - 目标表数据取得.3 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.0 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.4 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.5 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.1 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.2 - linenr 750000
2023/12/01 00:32:02 - 源表数据取得.0 - linenr 4550000
2023/12/01 00:32:02 - 源表数据取得.0 - linenr 4600000
2023/12/01 00:32:03 - 源表数据取得.0 - linenr 4650000
2023/12/01 00:32:04 - 源表数据取得.0 - linenr 4700000
2023/12/01 00:32:04 - 源表数据取得.0 - linenr 4750000
2023/12/01 00:32:05 - 源表数据取得.0 - linenr 4800000
2023/12/01 00:32:05 - 目标表数据取得.4 - linenr 8000002023/12/01 00:32:05 - 目标表数据取得.2 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.0 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.5 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.3 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.1 - linenr 800000
2023/12/01 00:32:05 - 源表数据取得.0 - linenr 4850000
2023/12/01 00:32:06 - 源表数据取得.0 - linenr 4900000
2023/12/01 00:32:07 - 源表数据取得.0 - linenr 4950000
2023/12/01 00:32:07 - 源表数据取得.0 - linenr 5000000
2023/12/01 00:32:07 - 源表数据取得.0 - linenr 5000000
2023/12/01 00:32:07 - 源表数据取得.0 - Finished reading query, closing connection
2023/12/01 00:32:07 - 源表数据取得.0 - 完成处理 (I=5000000, O=0, R=0, W=5000000, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.3 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.4 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.2 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.1 - 完成处理 (I=0, O=833334, R=833334, W=833334, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.5 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.0 - 完成处理 (I=0, O=833334, R=833334, W=833334, U=0, E=0)
2023/12/01 00:32:08 - InputData - 开始项[Success]
2023/12/01 00:32:08 - InputData - 完成作业项[Success] (结果=[true])
2023/12/01 00:32:08 - InputData - 完成作业项[InputData] (结果=[true])
2023/12/01 00:32:08 - InputData - 完成作业项[VariableSet] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 开始项[Success]
2023/12/01 00:32:08 - AutoInputData - 完成作业项[Success] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 完成作业项[InputData] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 完成作业项[TableSet] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 任务执行完毕
2023/12/01 00:32:08 - Kitchen - Finished!
2023/12/01 00:32:08 - Kitchen - Start=2023/12/01 00:30:47.624, Stop=2023/12/01 00:32:08.132
2023/12/01 00:32:08 - Kitchen - Processing ended after 1 minutes and 20 seconds (80 seconds total).
- 4. 数据行数校验
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64
Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux
SQL> select count(*) from order_table;
COUNT(*)
---------------------
5000000
1 row fetched.
SQL> select * from order_table where rownum < 10;
ORDER_ID USER_CODE GOODS_NAME ORDER_DATE CITY ORDER_NUM
----------- ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------- ---------------------------------------------------------------- -----------
4988212 CWO1MIB2 零食大礼包D 2020-08-09 上海 7
4988206 JLT24YJO 零食大礼包C 2020-06-22 深圳 8
4988200 VA198Z4R 零食大礼包E 2020-03-16 广州 7
4988194 DDP65NZV 零食大礼包C 2020-11-26 上海 10
4988188 T0LSKIGJ 零食大礼包B 2020-01-13 深圳 7
4988182 18WTOQ4C 零食大礼包A 2020-12-18 杭州 3
4988176 NGZL9QOT 零食大礼包A 2020-12-23 上海 10
4988170 M7BJQ4U2 零食大礼包E 2020-09-16 广州 8
4988164 VPYE5X44 零食大礼包A 2020-02-06 广州
三、心得感悟
这次测试oracle、mysql数据库迁移到yashandb,发现yashandb周边生态工具能轻松对接开源的迁移工具。
站在巨人的肩膀上,开源高性能迁移工具与yashandb生态工具紧密结合,提出高性能离线迁移到yashandb的解决方案。
对比oracle,yashandb支持CSV离线导出与导入(exp\yasldr)、二进制导出与导入(exp\imp),能比较容易实现异构和同构数据库的数据迁移。
提给官方的改进点:
exp导出CSV场景下,建议将ctl文件也导出来,这样能比较方便yasldr导入。开源工具sqluldr2指定表导出时会自动帮用户导出ctl文件,用户能比较方便的导入CSV数据。