暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

PostgreSQL17新特性之-COPY FROM的ON_ERROR容错选项

原创 阎书利 2024-05-30
694

一、ON_ERROR容错选项

1.ON_ERROR容错选项使用方式

PostgreSQL17-beta1针对COPY FROM增加的ON_ERROR选项使得Copy From语句在执行过程中部分解析、数据格式、字符集等相关的报错可以根据需求选择报错回滚事务里的全部记录/丢弃输入行并继续处理下一行。

image.png

error_action可以选择的值为stop和ignore,具体的作用如下:

选项 作用
stop 使命令失败,ON_ERROR该选项的的默认值。和PostgreSQL-16及以前的默认行为一致。
ignore 丢弃输入行并继续处理下一行。

针对ignore的这个作用,我测试过程发现有时候可以丢弃输入行并继续处理下一行。有时候并没有丢弃输入行并继续处理下一行,而是在错误行中断了。把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。出现了两种现象,具体可看后边测试及总结部分

大致的使用方式为如下所示:

copy test_copy_onerror  from '/home/postgres/1.sql' (ON_ERROR ignore);
copy test_copy_onerror  from '/home/postgres/1.sql' (ON_ERROR stop);

COPY的默认行为是"ON_ERROR stop",通过源码可以看到ON_ERROR选项的error_action目前也是仅有两种,源码注释里写着日后可能会添加更多的选项。

src/include/commands/copy.h

/*
 * Represents where to save input processing errors.  More values to be added
 * in the future.
 */
typedef enum CopyOnErrorChoice
{
	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
	COPY_ON_ERROR_IGNORE,		/* ignore errors */
} CopyOnErrorChoice;

----------------------------------------------------------------------------------------
src/backend/commands/copy.c

/*
 * Extract a CopyOnErrorChoice value from a DefElem.
 */
static CopyOnErrorChoice
defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
{
	char	   *sval = defGetString(def);

	if (!is_from)
		ereport(ERROR,
				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
				 errmsg("COPY ON_ERROR cannot be used with COPY TO"),
				 parser_errposition(pstate, def->location)));

	/*
	 * Allow "stop", or "ignore" values.
	 */
	if (pg_strcasecmp(sval, "stop") == 0)
		return COPY_ON_ERROR_STOP;
	if (pg_strcasecmp(sval, "ignore") == 0)
		return COPY_ON_ERROR_IGNORE;

	ereport(ERROR,
			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
			 errmsg("COPY ON_ERROR \"%s\" not recognized", sval),
			 parser_errposition(pstate, def->location)));
	return COPY_ON_ERROR_STOP;	/* keep compiler quiet */
}

2.测试案例

(1)创建测试表test_copy_onerror

创建一个测试表test_copy_onerror,name的类型为varchar(5),插入超过5个字符的则会报错。

postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:48)=# create table test_copy_onerror(id int,name varchar(5));
CREATE TABLE
postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:55)=# \d test_copy_onerror
                 Table "public.test_copy_onerror"
+--------+----------------------+-----------+----------+---------+
| Column |         Type         | Collation | Nullable | Default |
+--------+----------------------+-----------+----------+---------+
| id     | integer              |           |          |         |
| name   | character varying(5) |           |          |         |
+--------+----------------------+-----------+----------+---------+

postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:58)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

(2)准备一些要copy到表里的数据

准备一些要copy到表里的数据,其中第三行和第五行的第二列数据都是有问题的,长度超了,不能copy到表里

[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ cat testdata.txt
1       aaaaa
2       bbbbb
3       cccccc
4       ddddd
5       ffffff

测试结果1:使用copy的默认行为

结果如下,可以看出事务回滚了,没有任何一条数据插入,并且报出了在第三行数据存在问题。

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:24)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:27)=# copy test_copy_onerror  from '/home/postgres/testdata.txt';
ERROR:  value too long for type character varying(5)
CONTEXT:  COPY test_copy_onerror, line 3, column name: "cccccc"
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:30)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

测试结果2:使用ON_ERROR选项配合stop值

结果如下,可以看出事务回滚了,没有任何一条数据插入,并且报出了在第三行数据存在问题。可以看出现象和COPY的默认行为一致。

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:05)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:06)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR stop);
ERROR:  value too long for type character varying(5)
CONTEXT:  COPY test_copy_onerror, line 3, column name: "cccccc"
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:22)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

测试结果3:使用ON_ERROR选项配合ignore值(错误的值是超出字段长度)

结果如下,它把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。。

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:34:15)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:34:17)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore);
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:35:34)=# table test_copy_onerror;
+----+-------+
| id | name  |
+----+-------+
|  1 | aaaaa |
|  2 | bbbbb |
+----+-------+
(2 rows)

测试结果4:使用ON_ERROR选项配合ignore值(错误的值是类型不匹配)

结果如下,错误的值是类型不匹配时,会丢弃输入行并继续处理下一行。只有报错的行没有导入到表里。

postgres<17beta1>(ConnAs[postgres]:PID[23019] 2024-05-28/20:10:14)=# create table t2 (id int);
CREATE TABLE
postgres<17beta1>(ConnAs[postgres]:PID[23050] 2024-05-28/20:15:50)=# \d t2
                  Table "public.t2"
+--------+---------+-----------+----------+---------+
| Column |  Type   | Collation | Nullable | Default |
+--------+---------+-----------+----------+---------+
| id     | integer |           |          |         |
+--------+---------+-----------+----------+---------+

//要导入的数据有三行,其中第二行数据和表的列数据不符
[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ cat 1.txt
1
n
2

[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ psql
Border style is 2.
Line style is ascii.
psql (17beta1)
Type "help" for help.

postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:16:56)=# table t2;
+----+
| id |
+----+
+----+
(0 rows)

postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:17:02)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore);
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:17:10)=# table t2;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
(2 rows)

4.总结

经过测试,对于copy from的不同错误类型,就算选择了同样的ON_ERROR ignore选项,行为也可能不一样。目前我的测试中展现出两种行为:

(1)选择ON_ERROR ignore选项,错误的值是超出字段长度

没有丢弃输入行并继续处理下一行,而是在错误行中断了。把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。

(2)选择ON_ERROR ignore选项,错误的值是类型不匹配

错误的值是类型不匹配时,会丢弃输入行并继续处理下一行。只有报错的行没有导入到表里。

二、pg_stat_progress_copy视图的改进

PostgreSQL 14 开始,添加了pg_stat_progress_copy视图来检查 COPY 处理的进度。

PostgreSQL17-beta1版本里,添加了一个tuples_skipped的列,由于包含格式错误的数据而被跳过的元组数。仅当为 ON_ERROR 选项指定了停止以外的值时,此计数器才会增加。

postgres<16.1>(ConnAs[postgres]:PID[53951] 2024-05-29/17:19:31)=# \d pg_stat_progress_copy
            View "pg_catalog.pg_stat_progress_copy"
+------------------+---------+-----------+----------+---------+
|      Column      |  Type   | Collation | Nullable | Default |
+------------------+---------+-----------+----------+---------+
| pid              | integer |           |          |         |
| datid            | oid     |           |          |         |
| datname          | name    |           |          |         |
| relid            | oid     |           |          |         |
| command          | text    |           |          |         |
| type             | text    |           |          |         |
| bytes_processed  | bigint  |           |          |         |
| bytes_total      | bigint  |           |          |         |
| tuples_processed | bigint  |           |          |         |
| tuples_excluded  | bigint  |           |          |         |
+------------------+---------+-----------+----------+---------+

postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:38:19)=# \d pg_stat_progress_copy
            View "pg_catalog.pg_stat_progress_copy"
+------------------+---------+-----------+----------+---------+
|      Column      |  Type   | Collation | Nullable | Default |
+------------------+---------+-----------+----------+---------+
| pid              | integer |           |          |         |
| datid            | oid     |           |          |         |
| datname          | name    |           |          |         |
| relid            | oid     |           |          |         |
| command          | text    |           |          |         |
| type             | text    |           |          |         |
| bytes_processed  | bigint  |           |          |         |
| bytes_total      | bigint  |           |          |         |
| tuples_processed | bigint  |           |          |         |
| tuples_excluded  | bigint  |           |          |         |
| tuples_skipped   | bigint  |           |          |         |  <--增加的列
+------------------+---------+-----------+----------+---------+

三、COPY添加的LOG_VERBOSITY选项

COPY添加的LOG_VERBOSITY选项,这个选项可以报告 COPY FROM 被忽略的错误行。

image.png

verbosity可设置的值 作用
default 默认配置,报告跳过了多少行。
verbose 冗余输出设置。对于有错误的每一行,报告哪一列有问题以及最后跳过了多少行。

目前verbosity可设置的值仅有两个

src/include/commands/copy.h

... ...
/*
 * Represents verbosity of logged messages by COPY command.
 */
typedef enum CopyLogVerbosityChoice
{
	COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
	COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
... ...

----------------------------------------------------------------------------------------
src/backend/commands/copy.c

... ...
/*
 * Extract a CopyLogVerbosityChoice value from a DefElem.
 */
static CopyLogVerbosityChoice
defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
{
	char	   *sval;

	/*
	 * Allow "default", or "verbose" values.
	 */
	sval = defGetString(def);
	if (pg_strcasecmp(sval, "default") == 0)
		return COPY_LOG_VERBOSITY_DEFAULT;
	if (pg_strcasecmp(sval, "verbose") == 0)
		return COPY_LOG_VERBOSITY_VERBOSE;

	ereport(ERROR,
			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
			 errmsg("COPY LOG_VERBOSITY \"%s\" not recognized", sval),
			 parser_errposition(pstate, def->location)));
	return COPY_LOG_VERBOSITY_DEFAULT;	/* keep compiler quiet */
}
... ...

使用方式和是否使用 LOG_VERBOSITY verbose的对比如下,带有LOG_VERBOSITY verbose的会把跳过的行的行号以及具体数据均打印出来。

postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:05)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore);
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:06)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore,LOG_VERBOSITY default);
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:08)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore,LOG_VERBOSITY verbose);
NOTICE:  skipping row due to data type incompatibility at line 2 for column id: "n"
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
------------------
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:19)=#
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:20)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore);
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:36)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore,LOG_VERBOSITY default);
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:45)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore,LOG_VERBOSITY verbose);
NOTICE:  skipping row due to data type incompatibility at line 3 for column name: "cccccc"
NOTICE:  skipping row due to data type incompatibility at line 4 for column id: "4       dddd"
NOTICE:  skipping row due to data type incompatibility at line 5 for column id: "5       ffffff"
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2

image.png

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论