作者
digoal
日期
2020-07-27
标签
PostgreSQL , copy , binary , command , sql
背景
copy 命令新增binary格式, 这个格式的存储更加紧凑, 节约存储空间, 同时有一定性能提升. 支持send, recv函数的类型支持binary格式copy.
FORMAT
Selects the data format to be read or written: text, csv (Comma Separated Values), or binary.
The default is text.
Binary Format
The binary format option causes all data to be stored/read as binary format rather than as text.
It is somewhat faster than the text and CSV formats,
but a binary-format file is less portable across machine architectures and PostgreSQL versions.
Also, the binary format is very data type specific;
for example it will not work to output binary data from a smallint column and read it into an integer column,
even though that would work fine in text format.
The binary file format consists of a file header, zero or more tuples containing the row data, and a file trailer.
Headers and data are in network byte order.
Note
PostgreSQL releases before 7.4 used a different binary file format.
Presently, all data values in a binary-format file are assumed to be in binary format (format code one).
It is anticipated that a future extension might add a header field that allows per-column format codes to be specified.
To determine the appropriate binary format for the actual tuple data you should consult the PostgreSQL source,
in particular the send and recv functions for each column's data type
(typically these functions are found in the src/backend/utils/adt/ directory of the source distribution).
性能优化: Improve performance of binary COPY FROM through better buffering.
```
Improve performance of binary COPY FROM through better buffering.
author Tom Lane tgl@sss.pgh.pa.us
Sun, 26 Jul 2020 04:34:35 +0800 (16:34 -0400)
committer Tom Lane tgl@sss.pgh.pa.us
Sun, 26 Jul 2020 04:34:35 +0800 (16:34 -0400)
commit 0a0727ccfc5f4e2926623abe877bdc0b5bfd682e
tree c2efe923fd542ebe195f1f51d685d9aa8f4a8a60 tree | snapshot
parent 8a37951eebffd9bf528cb06d46127fb721d0e452 commit | diff
Improve performance of binary COPY FROM through better buffering.
At least on Linux and macOS, fread() turns out to have far higher
per-call overhead than one could wish. Reading 64KB of data at a time
and then parceling it out with our own memcpy logic makes binary COPY
from a file significantly faster --- around 30% in simple testing for
cases with narrow text columns (on Linux ... even more on macOS).
In binary COPY from frontend, there's no per-call fread(), and this
patch introduces an extra layer of memcpy'ing, but it still manages
to eke out a small win. Apparently, the control-logic overhead in
CopyGetData() is enough to be worth avoiding for small fetches.
Bharath Rupireddy and Amit Langote, reviewed by Vignesh C,
cosmetic tweaks by me
Discussion: https://postgr.es/m/CALj2ACU5Bz06HWLwqSzNMN=Gupoj6Rcn_QVC+k070V4em9wu=A@mail.gmail.com
```
参考
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a0727ccfc5f4e2926623abe877bdc0b5bfd682e
https://www.postgresql.org/docs/devel/sql-copy.html
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.
9.9元购买3个月阿里云RDS PostgreSQL实例
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





