DataX 使用说明
datax是通过指定一个配置文件,命令行执行,离线数据同步工具
官方说明如下
❝DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。
❞
下载DataX
根据README的说明一步步安装即可使用基础版的demo
https://github.com/alibaba/DataX/blob/master/userGuid.md
下载源码
$ git clone git@github.com:alibaba/DataX.git通过maven打包
$ cd {DataX_source_code_home}
$ mvn -U clean package assembly:assembly -Dmaven.test.skip=true注意:打包完的文件在源码根目录/target/datax/datax
打包失败原因
JDK版本环境问题,使用JDK1.8
maven仓库包缺少
测试demo启动
查看配置模板
python datax.py -r streamreader -w streamwriter创建demo配置文件 stream2stream.json
touch stream2stream.json文件内容如下
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"sliceRecordCount": 10,
"column": [
{
"type": "long",
"value": "10"
},
{
"type": "string",
"value": "hello,你好,世界-DataX"
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "UTF-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": 5
}
}
}
}启动流程,在源码目录的bin目录下面
$ cd {YOUR_DATAX_DIR_BIN}
$ python datax.py ./stream2stream.json
Oracle2Dm.json
{
"job": {
"setting": {
"speed": {
"channel": 5
}
},
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "test",
"password": "test",
"connection": [
{
"querySql": [
"select col1,col2,col3,col4,col5 from 表名"
],
"jdbcUrl": [
"jdbc:oracle:thin:@127.0.0.1:1521:orcl"
]
}
]
}
},
"writer": {
"name": "rdbmswriter",
"parameter": {
"connection": [
{
"jdbcUrl": "jdbc:dm://127.0.0.1:5236/TEST",
"table": [
"test_table"
]
}
],
"username": "TEST",
"password": "1234567890",
"table": "test_table",
"column": [
"col1",
"col2",
"col3",
"col4",
"col5"
],
"preSql": [
"delete from test_table;"
]
}
}
}
]
}
}
Oracle2MySql.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "jdjda",
"password": "jdjda",
"connection": [
{
"querySql": [
"select ROLL_ID,ORGANIZATION_NO,ARCHIVES_NO,DEPARTMENT_NO,START_TIME from T_ARCHIVES_AJ_MAIN"
],
"jdbcUrl": [
"jdbc:oracle:thin:@192.168.168.66:1521:orcl"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "root",
"column": [
"col1",
"col2",
"col3",
"col4",
"col5"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from orcl"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=gbk",
"table": [
"orcl"
]
}
]
}
}
}
]
}
}
更多配置文件信息参考github
https://github.com/alibaba/DataX
文章转载自醉鱼Java,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




