前言
此文章为读者投稿 (欢迎各位读者投稿👏🏻),Author:孤傲小二~阿沐,PostgreSQL 爱好者,目前从事 PostgreSQL 内核开发相关工作。
17 正式发布已经有一段时日了,其中有一项很不错的开发特性 — COPY 容错处理,在 17 以前,如果 COPY TO 导出遇到了错误,出错前的数据不受影响,但是如果是 COPY FROM 的话,就会比较头疼,一旦遇到报错,比如类型不匹配,缺字段等等,都会导致之前的数据全部不可见,并且会类似死元组一样,占据着空间!尤其是 OLAP 场景下,导入到 99% 给你来个报错,那无疑十分令人抓狂 (所以 Greenplum 对此进行了增强,COPY 支持一定的容错处理) 同时在 17 中,pg_stat_progress_copy 中还新增了 tuples_skipped 字段。

这一篇文章,阿沐对于 LOG_VERBOSITY 进行了原理剖析 🎉👏🏻。
功能使用背景说明
Add new COPY option LOG_VERBOSITY.This commit adds a new COPY option LOG_VERBOSITY, which controls theamount of messages emitted during processing. Valid values are'default' and 'verbose'.This is currently used in COPY FROM when ON_ERROR option is set toignore. If 'verbose' is specified, a NOTICE message is emitted foreach discarded row, providing additional information such as linenumber, column name, and the malformed value. This helps users toidentify problematic rows that failed to load.
添加新的 COPY 选项 LOG_VERBOSITY
•此提交添加了新的 COPY 选项 LOG_VERBOSITY,用于控制处理期间发出的消息量。有效值为“default”和“verbose”•当 ON_ERROR 选项设置为忽略时,当前在 COPY FROM 中使用此选项。如果指定了“verbose”,则会为每个丢弃的行发出一条 NOTICE 消息,提供其他信息,例如行号、列名和格式错误的值。这有助于用户识别无法加载的问题行
Add log_verbosity = 'silent' support to COPY command.Previously, when the on_error option was set to ignore, the COPY commandwould always log NOTICE messages for input rows discarded due todata type incompatibility. Users had no way to suppress these messages.This commit introduces a new log_verbosity setting, 'silent',which prevents the COPY command from emitting NOTICE messageswhen on_error = 'ignore' is used, even if rows are discarded.This feature is particularly useful when processing malformed filesfrequently, where a flood of NOTICE messages can be undesirable.For example, when frequently loading malformed files via the COPY commandor querying foreign tables using file_fdw (with an upcoming patch toadd on_error support for file_fdw), users may prefer to suppressthese messages to reduce log noise and improve clarity.
为 COPY 命令添加 log_verbosity = 'silent' 支持
•以前,当 on_error 选项设置为 ignore 时,COPY 命令将始终记录由于数据类型不兼容而丢弃的输入行的 NOTICE 消息。用户无法抑制这些消息•此提交引入了一个新的 log_verbosity 设置 'silent',当使用 on_error = 'ignore' 时,即使行被丢弃,它也会阻止 COPY 命令发出 NOTICE 消息。此功能在频繁处理格式错误的文件时特别有用,在这种情况下,大量的 NOTICE 消息可能是不受欢迎的•例如,当通过 COPY 命令频繁加载格式错误的文件或使用 file_fdw 查询外部表时(即将发布的补丁将为 file_fdw 添加 on_error 支持),用户可能更愿意抑制这些消息以减少日志噪音并提高清晰度
案例展示1,如下:
postgres=# select version();version---------------------------------------------------------------------------------PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.1.0, 64-bit(1 row)postgres=# \set SHOW_CONTEXT alwayspostgres=#postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);CREATE TABLEpostgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore, log_verbosity verbose);Enter data to be copied followed by a newline.End with a backslash and a period on a line by itself, or an EOF signal.>> 1 {1} 1>> a {2} 2>> 3 {3} 3333333333>> 4 {a, 4} 4>>>> 5 {5} 5>> 6 a>> 7 {7} a>> 8 {8} 8>> \.NOTICE: skipping row due to data type incompatibility at line 2 for column "n": "a"CONTEXT: COPY check_ign_errNOTICE: skipping row due to data type incompatibility at line 3 for column "k": "3333333333"CONTEXT: COPY check_ign_errNOTICE: skipping row due to data type incompatibility at line 4 for column "m": "{a, 4}"CONTEXT: COPY check_ign_errNOTICE: skipping row due to data type incompatibility at line 5 for column "n": ""CONTEXT: COPY check_ign_errNOTICE: skipping row due to data type incompatibility at line 7 for column "m": "a"CONTEXT: COPY check_ign_errNOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"CONTEXT: COPY check_ign_errNOTICE: 6 rows were skipped due to data type incompatibilityCOPY 3postgres=# table check_ign_err;n | m | k---+-----+---1 | {1} | 15 | {5} | 58 | {8} | 8(3 rows)postgres=#
案例展示2,如下:
postgres=# CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;CREATE DOMAINpostgres=# CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);CREATE TABLEpostgres=# COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);Enter data to be copied followed by a newline.End with a backslash and a period on a line by itself, or an EOF signal.>> 1 {1} 1 'foo'>> 2 {2} 2 \N>> \.NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null inputCONTEXT: COPY check_ign_err2NOTICE: 1 row was skipped due to data type incompatibilityCOPY 1postgres=# table check_ign_err2;n | m | k | l---+-----+---+-------1 | {1} | 1 | 'foo'(1 row)postgres=#
案例展示3,如下:
postgres=# COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);Enter data to be copied followed by a newline.End with a backslash and a period on a line by itself, or an EOF signal.>> 3 {3} 3 'bar'>> 4 {4} 4 \N>> \.COPY 1postgres=# table check_ign_err2;n | m | k | l---+-----+---+-------1 | {1} | 1 | 'foo'3 | {3} | 3 | 'bar'(2 rows)postgres=# COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity default);Enter data to be copied followed by a newline.End with a backslash and a period on a line by itself, or an EOF signal.>> 5 {5} 5 'bar'>> 6 {6} 6 \N>> \.NOTICE: 1 row was skipped due to data type incompatibilityCOPY 1postgres=#
OK,接下来做一个小结 如下(当 ON_ERROR 选项设置为 ignore 时):
1.如果至少有一行被丢弃,则在 COPY FROM 结束时会发出一条 NOTICE 消息,其中包含被忽略的行数。2.当 LOG_VERBOSITY 选项设置为 verbose 时,对于每个被丢弃的行,都会发出一条 NOTICE 消息,其中包含输入文件的行和输入转换失败的列名。3.当设置为 silent 时,不会发出有关被忽略的行的消息。
功能实现源码解析
相关数据结构如下:
// src/include/commands/copy.h/** Represents verbosity of logged messages by COPY command.*/typedef enum CopyLogVerbosityChoice{COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is* the default, assign 0 */// 不记录其他消息。由于这是默认设置,因此分配 0COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */} CopyLogVerbosityChoice;
// src/backend/commands/copy.c/** Process the statement option list for COPY.* 处理 COPY 的语句选项列表。** Scan the options list (a list of DefElem) and transpose the information* into *opts_out, applying appropriate error checking.* 扫描选项列表(DefElem 列表)并将信息转置到 *opts_out,应用适当的错误检查。** If 'opts_out' is not NULL, it is assumed to be filled with zeroes initially.* 如果“opts_out”不为 NULL,则假定它最初用零填充。** This is exported so that external users of the COPY API can sanity-check* a list of options. In that usage, 'opts_out' can be passed as NULL and* the collected data is just leaked until CurrentMemoryContext is reset.* 导出此信息,以便 COPY API 的外部用户可以对选项列表进行健全性检查。* 在这种用法中,“opts_out”可以作为 NULL 传递,并且收集的数据只会泄露,直到 CurrentMemoryContext 重置。** Note that additional checking, such as whether column names listed in FORCE* QUOTE actually exist, has to be applied later. This just checks for* self-consistency of the options list.* 请注意,稍后必须应用其他检查,例如 FORCE QUOTE 中列出的列名是否实际存在。* 这只是检查选项列表的自洽性。*/voidProcessCopyOptions(ParseState *pstate,CopyFormatOptions *opts_out,bool is_from,List *options){.../* Support external use for option sanity checking */if (opts_out == NULL)opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));.../* Extract options from the statement node tree */foreach(option, options){DefElem *defel = lfirst_node(DefElem, option);...else if (strcmp(defel->defname, "log_verbosity") == 0){if (log_verbosity_specified)errorConflictingDefElem(defel, pstate);log_verbosity_specified = true;opts_out->log_verbosity = defGetCopyLogVerbosityChoice(defel, pstate);}...}...}

如上,即使没有指定该选项 也会被设置成默认值COPY_LOG_VERBOSITY_DEFAULT。接下来看一下解析函数:
/** Extract a CopyLogVerbosityChoice value from a DefElem.*/static CopyLogVerbosityChoicedefGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate){char *sval;/** Allow "silent", "default", or "verbose" values.*/sval = defGetString(def);if (pg_strcasecmp(sval, "silent") == 0)return COPY_LOG_VERBOSITY_SILENT;if (pg_strcasecmp(sval, "default") == 0)return COPY_LOG_VERBOSITY_DEFAULT;if (pg_strcasecmp(sval, "verbose") == 0)return COPY_LOG_VERBOSITY_VERBOSE;ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),/*- translator: first %s is the name of a COPY option, e.g. ON_ERROR */errmsg("COPY %s \"%s\" not recognized", "LOG_VERBOSITY", sval),parser_errposition(pstate, def->location)));return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */}
接下来看一下详细调试过程,如下:


此时的函数堆栈,如下:
NextCopyFrom(CopyFromState cstate, ExprContext * econtext, Datum * values, _Bool * nulls)CopyFrom(CopyFromState cstate)DoCopy(ParseState * pstate, const CopyStmt * stmt, int stmt_location, int stmt_len, uint64 * processed)standard_ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc)ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc)PortalRunUtility(Portal portal, PlannedStmt * pstmt, _Bool isTopLevel, _Bool setHoldSnapshot, DestReceiver * dest, QueryCompletion * qc)PortalRunMulti(Portal portal, _Bool isTopLevel, _Bool setHoldSnapshot, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc)PortalRun(Portal portal, long count, _Bool isTopLevel, _Bool run_once, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc)exec_simple_query(const char * query_string)...
上面多行的循环处理,结束之后 如下:

postgres=# COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);Enter data to be copied followed by a newline.End with a backslash and a period on a line by itself, or an EOF signal.>> 7 {7} 7 'bar'>> 8 {8} 8 \N>> 9 {9} 9 \N>> \.NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null inputNOTICE: skipping row due to data type incompatibility at line 3 for column "l": null inputNOTICE: 2 rows were skipped due to data type incompatibilityCOPY 1postgres=#
如果这里设置的是silent,那么上面这一行也不会打印了!
这个功能比较简单,这里不再赘述!对错误信息的提示量根据自己需要酌情设置即可,如果指定了verbose,则会为每个丢弃的行发出一条 NOTICE 消息,提供其他信息(如上):
•例如行号、列名和格式错误的值
•这有助于用户识别无法加载的问题行




