暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

在 MySQL 中更快地加载数据输出文件

原创 黎青峰 2022-10-14
358

使用 MySQL 原生实用程序导出表时,我们无法控制进程,并且进程完成时也不会有进度更新。所以在导出较大的表时会消耗很高的资源利用率,并且磁盘空间使用率也会很高。

MySQL shell 实用程序将使该过程更容易。它将导出表,我们可以使用并行线程将数据导入回来,并且还将提供导出/导入进度的当前进度状态。

Shell - 8.0.22 版本中引入了util.exportTable()实用程序,将以受控方式导出数据。我们也可以将数据存储在本地或云基础设施对象存储桶中。

我们将了解压缩率以及本机 MySQL 与 Shell 实用程序所花费的时间

特征 :

  • 压缩
  • 进度状态
  • 支持的输出格式——CSV、CSV-unix、TSV
  • 受控过程 (Maxrate)
  • 将文件输出到本地或 Oracle 云基础设施对象存储桶

例子 :

MySQL  localhost  JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/sbtest.txt")
Initializing - done 
Gathering information - done 
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done       
Starting data dump
107% (26.22M rows / ~24.34M rows), 287.77K rows/s, 56.98 MB/s
Dump duration: 00:02:35s                                     
Total duration: 00:02:35s                                    
Data size: 5.18 GB                                           
Rows written: 26216172                                       
Bytes written: 5.18 GB                                       
Average throughput: 33.35 MB/s                               
                                                             
The dump can be loaded using:                                
util.importTable("file:///home/vagrant/sbtest.txt", {        
    "characterSet": "utf8mb4",
    "schema": "sbtest",
    "table": "sbtest1"
})                 

我们应该确保该目录已经存在。但默认它会生成带有fieldsTerminatedBy - TAB的文件。最后,它会提供恢复数据时需要使用的导入命令。

导出为 CSV :

要将文件导出为 CSV,我们需要使用方言选项来确定输出格式。

MySQL  localhost  JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/sbtest.csv",{dialect:"csv"})
Initializing - done 
Gathering information - done 
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done       
Starting data dump
107% (26.22M rows / ~24.34M rows), 136.35K rows/s, 24.68 MB/s
Dump duration: 00:02:44s                                     
Total duration: 00:02:44s                                    
Data size: 5.31 GB                                           
Rows written: 26216172                                       
Bytes written: 5.31 GB                                       
Average throughput: 32.34 MB/s                               
                                                             
The dump can be loaded using:                                
util.importTable("file:///home/vagrant/sbtest.csv", {        
    "characterSet": "utf8mb4",
    "dialect": "csv",
    "schema": "sbtest",
    "table": "sbtest1"
})
 MySQL  localhost  JS >     

压缩:

zstd 和 gzip 两种压缩方式,默认禁用压缩。我们可以通过添加压缩选项来启用此功能。

MySQL localhost JS > util.exportTableMySQL  localhost  JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/sbtest.zstd",{compression:"zstd"})
Initializing - done 
Gathering information - done 
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done       
Starting data dump
107% (26.22M rows / ~24.34M rows), 135.43K rows/s, 26.62 MB/s uncompressed, 11.92 MB/s compressed
Dump duration: 00:03:43s                                                                         
Total duration: 00:03:43s                                                                        
Uncompressed data size: 5.18 GB                                                                  
Compressed data size: 2.32 GB                                                                    
Compression ratio: 2.2                                                                           
Rows written: 26216172                                                                           
Bytes written: 2.32 GB                                                                           
Average uncompressed throughput: 23.20 MB/s                                                      
Average compressed throughput: 10.41 MB/s                                                        
                                                                                                 
The dump can be loaded using:                                                                    
util.importTable("file:///home/vagrant/sbtest.zstd", {                                           
    "characterSet": "utf8mb4",
    "schema": "sbtest",
    "table": "sbtest1"
})
 MySQL  localhost  JS >                                           

最大速率:

导出进程不应成为其他进程的瓶颈。为了使其成为受控进程,我们可以为 maxrate 选项设置每个线程每秒的最大字节数。

MySQL  localhost  JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/sbtest_file.csv",{dialect:"csv",maxRate:"5000000",fieldsOptionallyEnclosed: true, fieldsTerminatedBy: ",", linesTerminatedBy: "\n", fieldsEnclosedBy: '"'})
Initializing - done 
Gathering information - done 
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done       
Starting data dump
53% (12.98M rows / ~24.34M rows), 36.47K rows/s, 7.55 MB/s 
76% (18.66M rows / ~24.34M rows), 31.73K rows/s, 6.30 MB/s
107% (26.22M rows / ~24.34M rows), 31.46K rows/s, 6.50 MB/s
Dump duration: 00:12:38s                                   
Total duration: 00:12:38s                                  
Data size: 5.28 GB                                         
Rows written: 26216172                                     
Bytes written: 5.28 GB                                     
Average throughput: 6.97 MB/s                              
                                                           
The dump can be loaded using:                              
util.importTable("file:///home/vagrant/sbtest_file.csv", { 
    "characterSet": "utf8mb4",
    "dialect": "csv",
    "fieldsEnclosedBy": "\"",
    "fieldsOptionallyEnclosed": true,
    "fieldsTerminatedBy": ",",
    "linesTerminatedBy": "\n",
    "schema": "sbtest",
    "table": "sbtest1"
})
 MySQL  localhost  JS > 


导出时间 – Native Vs Shell 实用程序

为了测试 Native MySQL vs Shell 实用程序的导出时间,我使用了以下实验室环境。

MySQL版本5.7.35

Shell版本

8.0.27
表大小5.16 GB

注意:数据加载是使用 sysbench 完成的。

本机方法:

我已经使用单线程本机 MySQL 从表中导出了数据。执行时间为 3 分 13 秒,文件大小约为 5 GB。

mysql> select * from sbtest1 INTO OUTFILE '/var/lib/mysql-files/sample_native.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Query OK, 26216172 rows affected (3 min 13.74 sec)

[root@centos11 mysql-files]# ls -ltrh
total 5.1G
-rw-rw-rw-. 1 mysql mysql 5.1G Nov 15 17:03 sample_native.csv

Shell实用程序:

使用 Shell 实用程序 util.exportTable 完成相同的表导出,导出在 2 分 30 秒内完成。

MySQL localhost JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/Sample_utility.csv",{dialect:"csv"})
Initializing - done
Gathering information - done
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done
Starting data dump
101% (26.22M rows / ~25.88M rows), 139.50K rows/s, 27.35 MB/s
Dump duration: 00:02:33s                                     
Total duration: 00:02:33s                                    
Data size: 5.31 GB                                           
Rows written: 26216172                                       
Bytes written: 5.31 GB                                       
Average throughput: 34.63 MB/s                               
                                                             
The dump can be loaded using:                                
util.importTable("file:///home/vagrant/Sample_utility.csv", {
    "characterSet": "utf8mb4",
    "dialect": "csv",
    "schema": "sbtest",
    "table": "sbtest1"
})
 MySQL  localhost  JS > 


[vagrant@centos11 ~]$ ls -ltrh Sample_utility.csv
-rw-r-----. 1 root root 5.0G Nov 15 17:08 Sample_utility.csv
原生 MySQLShell实用程序
用的时间3分13秒2分33秒
导出文件大小5.1GB5 GB

压缩率

在执行导出到更大的表时,我们还需要注意导出文件的大小。所以我比较了 zstd 和 gzip 方法的压缩率。

gzip方法:

MySQL localhost JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/Sample_utility.gz",{compression:"gzip"})
Initializing - done
Gathering information - done
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done
Starting data dump
101% (26.22M rows / ~25.88M rows), 122.02K rows/s, 24.26 MB/s uncompressed, 12.52 MB/s compressed
Dump duration: 00:03:49s                                                                         
Total duration: 00:03:50s                                                                        
Uncompressed data size: 5.18 GB                                                                  
Compressed data size: 2.68 GB                                                                    
Compression ratio: 1.9                                                                           
Rows written: 26216172                                                                           
Bytes written: 2.68 GB                                                                           
Average uncompressed throughput: 22.52 MB/s                                                      
Average compressed throughput: 11.64 MB/s                                                        
                                                                                                 
The dump can be loaded using:                                                                    
util.importTable("file:///home/vagrant/Sample_utility.gz", {                                     
    "characterSet": "utf8mb4",
    "schema": "sbtest",
    "table": "sbtest1"
})
 MySQL  localhost  JS > 

[vagrant@centos11 ~]$ ls -ltrh Sample_utility.gz
-rw-r-----. 1 root root 2.5G Nov 15 17:14 Sample_utility.gz


Zstd 方法:

MySQL localhost JS > util.exportTable("sbtest.sbtest1","file:///home/vagrant/sbtest1.zstd",{compression:"zstd"})
Initializing - done
Gathering information - done
Running data dump using 1 thread.
NOTE: Progress information uses estimated values and may not be accurate.
Writing table metadata - done
Starting data dump
101% (26.22M rows / ~25.88M rows), 165.84K rows/s, 32.40 MB/s uncompressed, 14.50 MB/s compressed
Dump duration: 00:02:38s                                                                         
Total duration: 00:02:38s                                                                        
Uncompressed data size: 5.18 GB                                                                  
Compressed data size: 2.32 GB                                                                    
Compression ratio: 2.2                                                                           
Rows written: 26216172                                                                           
Bytes written: 2.32 GB                                                                           
Average uncompressed throughput: 32.61 MB/s                                                      
Average compressed throughput: 14.63 MB/s                                                        
                                                                                                 
The dump can be loaded using:                                                                    
util.importTable("file:///home/vagrant/employees.zstd", {                                        
    "characterSet": "utf8mb4",
    "schema": "sbtest",
    "table": "sbtest1"
})
 MySQL  localhost  JS > 

[vagrant@centos11 ~]$ ls -ltrh sbtest1.zstd
-rw-r-----. 1 vagrant vagrant 2.2G Nov 15 17:50 sbtest1.zstd

压缩包标准差
执行时间处理时间3分50秒2分38秒
导出文件大小2.5 GB2.1GB

通过 MySQL Shell 实用程序,我们将表与压缩一起导出。


原文标题:Faster Load data outfile in MySQL
原文作者:Sangeetha K
原文地址:https://mydbops.wordpress.com/2021/11/18/export-table-in-parallel-thread/


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论