暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

「YashanDB个人版体验」oracle、mysql数据快速迁移到YashanDB

原创 xxxx 2023-12-01
644

一、背景

二、数据迁移

  •     dumpling与yasldr数据迁移
  •     sqluldr 与yasldr 数据迁移
  •     kettle批量迁移数据到yashandb

三、心得感悟

一、背景

最近国产数据库Yashandb个人版发布,引来数据库的爱好者下载学习。墨天伦的论坛关于这个数据库的帖子十分多,都说oracle常用功能都可以在Yashandb数据库上找到,有的功能更是推陈出新。Yashandb个人版已经发布一段时间了,异构数据库数据库迁入Yashandb文章比较少。这篇文章梳理了oracle、mysql数据库迁移到Yashandb的方法,希望能够帮助同学更好的学习和使用Yashandb。(听说官网有YMP一键数据迁移平台,人名币玩家可以联系小助手。)


二、数据迁移

mysql 到yashandb表结构 映射(仅供参考)

mysql

yashan

备注

bigint

bigint

int

int

smallint

int

tinyint

tinyint

DOUBLE

DOUBLE

FLOAT

FLOAT

NUMBER

DECIMAL

CHAR

NCHAR


VARCHAR

VARCHAR

datetime

timestamp

date

date

text\mediumtext\longtext

clob

mediumblob\longblob

blob、clob

针对目标端的类型,如果是图片、音频等二进制文件,在yashandb建议用blob;
如果存储是文本,建议使用clob

enum

char

set

char

year(4)

NUMBER(4)

MySQL 8.0.19开始,不推荐使用

varbinary

RAW

使用mysql_random_load_data 模拟一张10,000,000的大表

mysql表结构

CREATE DATABASE IF NOT EXISTS test;

CREATE TABLE `test`.`t3` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `tcol01` tinyint(4) DEFAULT NULL,
  `tcol02` smallint(6) DEFAULT NULL,
  `tcol03` mediumint(9) DEFAULT NULL,
  `tcol04` int(11) DEFAULT NULL,
  `tcol05` bigint(20) DEFAULT NULL,
  `tcol06` float DEFAULT NULL,
  `tcol07` double DEFAULT NULL,
  `tcol08` decimal(10,2) DEFAULT NULL,
  `tcol09` date DEFAULT NULL,
  `tcol10` datetime DEFAULT NULL,
  `tcol11` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `tcol12` time DEFAULT NULL,
  `tcol13` year(4) DEFAULT NULL,
  `tcol14` varchar(100) DEFAULT NULL,
  `tcol15` char(2) DEFAULT NULL,
  `tcol16` blob,
  `tcol17` text,
  `tcol18` mediumtext,
  `tcol19` mediumblob,
  `tcol20` longblob,
  `tcol21` longtext,
  `tcol22` mediumtext,
  `tcol23` varchar(3) DEFAULT NULL,
  `tcol24` varbinary(10) DEFAULT NULL,
  `tcol25` enum('a','b','c') DEFAULT NULL,
  `tcol26` set('red','green','blue') DEFAULT NULL,
  `tcol27` float(5,3) DEFAULT NULL,
  `tcol28` double(4,2) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

转换后,yashan表结构

create sequence test.seq_t3 start with 1 nocycle cache 200;

CREATE TABLE "TEST"."t3" (
  "id" int(11) default test.seq_t3.nextval  ,
  "tcol01" tinyint(4) DEFAULT NULL,
  "tcol02" int DEFAULT NULL,
  "tcol03" int  DEFAULT NULL,
  "tcol04" int(11) DEFAULT NULL,
  "tcol05" bigint(20) DEFAULT NULL,
  "tcol06" float DEFAULT NULL,
  "tcol07" double DEFAULT NULL,
  "tcol08" number(10,2) DEFAULT NULL,
  "tcol09" date DEFAULT NULL,
  "tcol10" timestamp DEFAULT NULL,
  "tcol11" timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP  ,
  "tcol12" time DEFAULT NULL,
  "tcol13" NUMBER(4) DEFAULT NULL,
  "tcol14" varchar(100 char) DEFAULT NULL,
  "tcol15" char(2 char) DEFAULT NULL,
  "tcol16" CLOB,
  "tcol17" CLOB,
  "tcol18" CLOB,
  "tcol19" CLOB,
  "tcol20" CLOB,
  "tcol21" CLOB,
  "tcol22" CLOB,
  "tcol23" varchar(3 char) DEFAULT NULL,
  "tcol24" varchar(10 char) DEFAULT NULL,
  "tcol25" char(1 char) DEFAULT NULL,
  "tcol26" varchar(5 char) DEFAULT NULL,
  "tcol27" float(5,3) DEFAULT NULL,
  "tcol28" double(4,2) DEFAULT NULL,
  PRIMARY KEY ("id")
)  ;

create or replace trigger t3_auto_update_time
before
update on  "TEST"."t3"
for each row
begin
  select sysdate into :NEW."tcol11"  from dual;
end;

dumpling 与yasldr 数据迁移

dumpling是tidb的一款高性能导出工具,能够实现mysql数据的快速导出,导出数据格式可以是CSV或insert语句。可用于mysql与异构数据库间数据迁移。

mysql dumpling 导出 mysql

./dumpling -uroot   -P 3306 -h 127.0.0.1 -t 8 \
--no-header -m  -o /home/mysql/dumping_export_data \
--filetype csv --csv-delimiter '"' --csv-null-value $'' \
--csv-line-terminator $'\n'  --output-filename-template '{{.DB}}.{{.Table}}' \
--filter  test.t3


yasldr

生成ctl文件的列

yasql ${URL} \
-c "SELECT '('||wm_concat(column_name)||')' FROM (select column_name from dba_tab_columns WHERE owner='${tableSchema}' AND table_name='${tableName}' ORDER BY column_id)"


yasldr导数命令


yasldr sys/yasdb_123@10.9.36.24:1688 batch_size=4032 control_text="'LOAD DATA INFILE '/home/yashan/test.t3.csv' FIELDS TERMINATED BY ',' optionally enclosed by '\"' INTO TABLE TEST.T3(ID,TCOL01,TCOL02,TCOL03,TCOL04,TCOL05,TCOL06,TCOL07,TCOL08,TCOL09,TCOL10,TCOL11,TCOL12,TCOL13,TCOL14,TCOL15,TCOL16,TCOL17,TCOL18,TCOL19,TCOL20,TCOL21,TCOL22,TCOL23,TCOL24,TCOL25,TCOL26,TCOL27,TCOL28)'"


数据校验

$ yasql sys/yasdb_123@10.9.36.24:1688
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64

Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux

SQL> select count(1) from TEST.T3;

             COUNT(1) 
--------------------- 
              1000000

1 row fetched.

批量生成ctl文件

[yashan@fw-centos ~]$ cat gen_ctl.sh
DBURL=${1:-sys/yasdb_123@0.0.0.0:1688}
TABLELIST=${2:-table.list}
DATAPATH=${3:-/tmp/data}


gen_ctl {
  local tableSchema=$(echo ${1}|awk -F. '{print $1}')
  local tableName=$(echo ${1}|awk -F. '{print $2}')
  local sourcePath=${DATAPATH}/${tableSchema}
  local loadControlFile=${1}.ctl
  
  local dataFile=$(echo ${sourcePath}/${tableName}|sed 's/\//\\\//g')

  local columnList=$(yasql ${DBURL} -c "SELECT '('||wm_concat(column_name)||')' FROM (select column_name from dba_tab_columns \
                   WHERE owner='${tableSchema}' AND table_name='${tableName}' ORDER BY column_id)" |sed "1,3 d"|head -1)

  cp load.template ${loadControlFile}
  sed -i "s/%DATAFILE%/${dataFile}/" ${loadControlFile}
  sed -i "s/%TABLENAME%/${tableSchema}.${tableName}${columnList}/" ${loadControlFile}
}

while read table
do 
gen_ctl  ${table}
done < ${TABLELIST}
[yashan@fw-centos ~]$ cat load.template LOAD DATA(RUN_LEVEL=SPLIT,DEGREE_OF_PARALLELISM=2) INFILE '%DATAFILE%' with embedded FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' INTO TABLE %TABLENAME%
[yashan@fw-centos ~]$ cat table.list TEST.T3
[yashan@fw-centos ~]$ ./gen_ctl.sh sys/yasdb_123@127.0.0.1:1688 table.list /home/yashan

yasboot工具导入崖山数据库

yasboot load -c yashandb -f / -sm PART --param "BATCH_SIZE=2048 CONN_POOL_SIZE=10"  

yasboot导入

数据行数校验

[yashan@fw-centos ~]$ yasboot load -c yashandb -f  /home/yashan/TEST.T3.ctl -sm PART --param "BATCH_SIZE=2048 CONN_POOL_SIZE=10"
 type | uuid             | name   | hostid | index                               | status  | return_code | progress | cost 
---------------------------------------------------------------------------------------------------------------------------
 task | 2338582294e2a9d0 | Loader | -      | yashandb.SYS__PART_6568400df13e04ee | SUCCESS | 0           | 100      | 26   
------+------------------+--------+--------+-------------------------------------+---------+-------------+----------+------
task completed, status: SUCCESS
split file path: /home/yashan 
[yashan@fw-centos ~]$ yasql sys/yasdb_123
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64

Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux

SQL> select count(1) from test.t3;

             COUNT(1) 
--------------------- 
              1000000

1 row fetched.

SQL> 
Disconnected from:

从上述例子看出,dumpling导出的csv能通过yasldr 快速 导入yashandb。dumpling和yasldr都是针对大数据量设计的导出与导入工具,可以快速完成离线迁移数据的任务。


sqluldr2与yasldr 数据迁移

oracle到yashandb表结构数据类型基本一致,oracle的表结构可以比较平滑迁移到YashanDB,具体数据类型描述可以参考Yashandb的数据类型描述


在oracle上建表并通过存储过程插入500w的数据

oracle建表语句如下:

  CREATE TABLE ORDER_TABLE
     (
     ORDER_ID NUMBER
     ,USER_CODE VARCHAR2(256)
    ,GOODS_NAME VARCHAR2(256)
    ,ORDER_DATE DATE
    ,CITY VARCHAR2(256)
    ,ORDER_NUM NUMBER
    );

squldr2与yasldr 迁移数据

sqluldr2是能从oracle 快速导出数据为CSV格式的工具

./sqluldr2 test_user/yasdb_123@127.0.0.1:1521/orcl query=SYS.ORDER_TABLE  QUOTe='"' file=SYS.ORDER_TABLE
           0 rows exported at 2023-11-30 21:14:18, size 0 MB.
     1000000 rows exported at 2023-11-30 21:14:19, size 64 MB.
     2000000 rows exported at 2023-11-30 21:14:21, size 132 MB.
     3000000 rows exported at 2023-11-30 21:14:23, size 200 MB.
     4000000 rows exported at 2023-11-30 21:14:25, size 268 MB.
     5000000 rows exported at 2023-11-30 21:14:26, size 336 MB.
         output file SYS.ORDER_TABLE closed at 5000000 rows, size 338 MB.
[oracle@fw-centos ~]$ head SYS.ORDER_TABLE
1001,"UKAJ5F5C","零食大礼包C","2020-08-29 00:00:00","上海",5
1002,"JOVIKU63","零食大礼包E","2020-10-06 00:00:00","上海",9
1003,"AZDJ8AD8","零食大礼包E","2020-07-28 00:00:00","上海",6
1004,"VIZNIEIB","零食大礼包B","2020-09-07 00:00:00","深圳",8
1005,"FN8RY5W4","零食大礼包C","2020-05-01 00:00:00","北京",4
1006,"PJL9RK0E","零食大礼包A","2020-02-27 00:00:00","北京",1

ll -h
total 2.8G
-rw-r--r--.  1 oracle oinstall 300M Nov 30 20:23 ORDER_TABLE.csv
-rw-r--r--.  1 oracle oinstall  859 Nov 30 20:23 SYS.ORDER_TABLE_sqlldr.ctl


[yashan@fw-centos ~]$ yasboot load -c yashandb -f  /home/yashan/SYS.ORDER_TABLE.ctl --param "BATCH_SIZE=2048 CONN_POOL_SIZE=10"
 type | uuid             | name   | hostid | index                               | status  | return_code | progress | cost 
---------------------------------------------------------------------------------------------------------------------------
 task | 5b1adba0abb7b9e5 | Loader | -      | yashandb.SYS__PART_65688df08f33171b | SUCCESS | 0           | 100      | 11   
------+------------------+--------+--------+-------------------------------------+---------+-------------+----------+------
task completed, status: SUCCESS
split file path: /home/yashan 


数据行数校验

YashanDB SQL Personal Edition Release 23.1.1.100 x86_64

Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux

SQL> select count(1) from ORDER_TABLE;

             COUNT(1) 
--------------------- 
              5000000

1 row fetched.

SQL> select * from ORDER_TABLE where rownum < 3; 

   ORDER_ID USER_CODE                                                        GOODS_NAME                                                       ORDER_DATE                       CITY                                                               ORDER_NUM 
----------- ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------- ---------------------------------------------------------------- ----------- 
       1001 UKAJ5F5C                                                         零食大礼包C                                                 2020-08-29                       上海                                                                     5
       1002 JOVIKU63                                                         零食大礼包E                                                 2020-10-06                       上海                                                                     9

2 rows fetched.

从上述例子看出,sqluldr2导出的csv能通过yasldr 快速导入yashandb。sqluldr2和yasldr都是针对大数据量设计的导出与导入工具,可以快速完成离线迁移数据的任务。

Kettle批量迁移数据到yashandb

kettle是一款国外开源的ETL工具,纯java编写,可以在windows、linux上运行。Kettle以数据库驱动的形式,实现yashandb与异构数据间互相转换。

gitee链接:https://gitee.com/xujiafu/kettle-script

1. 安装kettle软件及java驱动包、上传依赖的驱动包
$ pwd
/home/kettle/data-integration/lib
[kettle@fw-centos lib]$ ll yashan* oj*
-rw-r--r--. 1 kettle kettle 4036257 5月  25 2023 ojdbc8.jar
-rw-r--r--. 1 kettle kettle  534715 11月 30 22:12 yashandb-jdbc-1.5.1.jar
2. 编辑simple-jnd的simple.jdbc文件

DataInput 写源端的jdbc连接信息

yasOutput 写目标端jdbc连接信息

注:目标端默认会做truncate table操作

cat /home/kettle/data-integration/simple-jndi

DataInput/type=javax.sql.DataSource
DataInput/driver=oracle.jdbc.driver.OracleDriver
DataInput/url=jdbc:oracle:thin:@10.9.36.23:1521/orcl
DataInput/user=TEST_USER
DataInput/password=yasdb_123



yasOutput/type=javax.sql.DataSource
yasOuput/driver=com.yashandb.jdbc.Driver
yasOuput/url=jdbc:yasdb://10.9.36.24:1688/test
yasOuput/user=SYS
yasOuput/password=yasdb_123
3.填写需要迁移的表名
[kettle@fw-centos my-script]$ cat Table.txt 
Table
ORDER_TABLE
  • kettle 执行日志
2023/12/01 00:32:00 - 源表数据取得.0 - linenr 4450000
2023/12/01 00:32:00 - 源表数据取得.0 - linenr 4500000
2023/12/01 00:32:01 - 目标表数据取得.3 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.0 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.4 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.5 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.1 - linenr 750000
2023/12/01 00:32:01 - 目标表数据取得.2 - linenr 750000
2023/12/01 00:32:02 - 源表数据取得.0 - linenr 4550000
2023/12/01 00:32:02 - 源表数据取得.0 - linenr 4600000
2023/12/01 00:32:03 - 源表数据取得.0 - linenr 4650000
2023/12/01 00:32:04 - 源表数据取得.0 - linenr 4700000
2023/12/01 00:32:04 - 源表数据取得.0 - linenr 4750000
2023/12/01 00:32:05 - 源表数据取得.0 - linenr 4800000
2023/12/01 00:32:05 - 目标表数据取得.4 - linenr 8000002023/12/01 00:32:05 - 目标表数据取得.2 - linenr 800000

2023/12/01 00:32:05 - 目标表数据取得.0 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.5 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.3 - linenr 800000
2023/12/01 00:32:05 - 目标表数据取得.1 - linenr 800000
2023/12/01 00:32:05 - 源表数据取得.0 - linenr 4850000
2023/12/01 00:32:06 - 源表数据取得.0 - linenr 4900000
2023/12/01 00:32:07 - 源表数据取得.0 - linenr 4950000
2023/12/01 00:32:07 - 源表数据取得.0 - linenr 5000000
2023/12/01 00:32:07 - 源表数据取得.0 - linenr 5000000
2023/12/01 00:32:07 - 源表数据取得.0 - Finished reading query, closing connection
2023/12/01 00:32:07 - 源表数据取得.0 - 完成处理 (I=5000000, O=0, R=0, W=5000000, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.3 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.4 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.2 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.1 - 完成处理 (I=0, O=833334, R=833334, W=833334, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.5 - 完成处理 (I=0, O=833333, R=833333, W=833333, U=0, E=0)
2023/12/01 00:32:08 - 目标表数据取得.0 - 完成处理 (I=0, O=833334, R=833334, W=833334, U=0, E=0)
2023/12/01 00:32:08 - InputData - 开始项[Success]
2023/12/01 00:32:08 - InputData - 完成作业项[Success] (结果=[true])
2023/12/01 00:32:08 - InputData - 完成作业项[InputData] (结果=[true])
2023/12/01 00:32:08 - InputData - 完成作业项[VariableSet] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 开始项[Success]
2023/12/01 00:32:08 - AutoInputData - 完成作业项[Success] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 完成作业项[InputData] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 完成作业项[TableSet] (结果=[true])
2023/12/01 00:32:08 - AutoInputData - 任务执行完毕
2023/12/01 00:32:08 - Kitchen - Finished!
2023/12/01 00:32:08 - Kitchen - Start=2023/12/01 00:30:47.624, Stop=2023/12/01 00:32:08.132
2023/12/01 00:32:08 - Kitchen - Processing ended after 1 minutes and 20 seconds (80 seconds total).
  1. 4. 数据行数校验
YashanDB SQL Personal Edition Release 23.1.1.100 x86_64

Connected to:
YashanDB Server Personal Edition Release 23.1.1.100 x86_64 - X86 64bit Linux

SQL> select count(*) from order_table;

             COUNT(*) 
--------------------- 
              5000000

1 row fetched.

SQL> select * from order_table where rownum < 10;

   ORDER_ID USER_CODE                                                        GOODS_NAME                                                       ORDER_DATE                       CITY                                                               ORDER_NUM 
----------- ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------- ---------------------------------------------------------------- ----------- 
    4988212 CWO1MIB2                                                         零食大礼包D                                                 2020-08-09                       上海                                                                     7
    4988206 JLT24YJO                                                         零食大礼包C                                                 2020-06-22                       深圳                                                                     8
    4988200 VA198Z4R                                                         零食大礼包E                                                 2020-03-16                       广州                                                                     7
    4988194 DDP65NZV                                                         零食大礼包C                                                 2020-11-26                       上海                                                                    10
    4988188 T0LSKIGJ                                                         零食大礼包B                                                 2020-01-13                       深圳                                                                     7
    4988182 18WTOQ4C                                                         零食大礼包A                                                 2020-12-18                       杭州                                                                     3
    4988176 NGZL9QOT                                                         零食大礼包A                                                 2020-12-23                       上海                                                                    10
    4988170 M7BJQ4U2                                                         零食大礼包E                                                 2020-09-16                       广州                                                                     8
    4988164 VPYE5X44                                                         零食大礼包A                                                 2020-02-06                       广州         

三、心得感悟

这次测试oracle、mysql数据库迁移到yashandb,发现yashandb周边生态工具能轻松对接开源的迁移工具。

站在巨人的肩膀上,开源高性能迁移工具与yashandb生态工具紧密结合,提出高性能离线迁移到yashandb的解决方案。

对比oracle,yashandb支持CSV离线导出与导入(exp\yasldr)、二进制导出与导入(exp\imp),能比较容易实现异构和同构数据库的数据迁移。

提给官方的改进点:

exp导出CSV场景下,建议将ctl文件也导出来,这样能比较方便yasldr导入。开源工具sqluldr2指定表导出时会自动帮用户导出ctl文件,用户能比较方便的导入CSV数据。


最后修改时间:2023-12-04 15:55:36
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论