PostgreSQL中的NULL值使用的几点小结

数据库杂记 2023-05-14

PostgreSQL中的NULL值使用的几点小结

tag: 应用开发, 数据库原理

前言：

我们首先来看看NULL值的原始定义：

1`NULL` 值是数据库中表示缺失或未知值的特殊标记。它用于表示某个字段没有具体的值。需要注意的是，`NULL` 与空字符串或零不同，它们有具体的值，而 `NULL` 表示该值不存在或者未知。
2
3在 SQL 语句中，可以使用 `IS NULL` 或 `IS NOT NULL` 运算符来区分包含 `NULL` 值的记录。

1-- 查询所有 name 字段为 NULL 的记录
2SELECT * FROM table_name WHERE name IS NULL;
3
4-- 查询所有 age 字段不为 NULL 的记录
5SELECT * FROM table_name WHERE age IS NOT NULL;

也就是说在标准的SQL语法当中，IS NULL和IS NOT NULL是能被正确处理并且没有疑义的。

那么在PostgreSQL数据库当中，我们有哪些地方要注意的呢？

分析：

我们看看下边的例子：

1postgres=# create table test4(id int primary key, col2 varchar(32));
2CREATE TABLE
3postgres=# insert into test4 values(1, null), (2, 'abcd');
4INSERT 0 2

表test4有两条记录，一条记录字段col2的值为NULL。

1.count计数

我们看看count()查询：

1postgres=# select count(id) as count_id, count(col2) as count_col2 from test4;
2 count_id | count_col2
3----------+------------
4        2 |          1
5(1 row)

你会发现，如果以col2来计数的话，它返回的是1。因为有一行该列的值为NULL，是不会计到count当中的。

1postgres=# select count(*), count(1) from test4;
2 count | count
3-------+-------
4     2 |     2
5(1 row)

作为对比，count(*), count(1)结果不受影响。

2.不等于（！=）和等于(=)条件判断

 1postgres=# select * from test4 where col2 is not null;
 2 id | col2
 3----+------
 4  2 | abcd
 5(1 row)
 6
 7postgres=# select * from test4 where col2 != null;
 8 id | col2
 9----+------
10(0 rows)
11postgres=# select * from test4 where col2 = null;
12 id | col2
13----+------
14(0 rows)

你能看到，在默认情况下，!= null跟is not null行为好像不太一样。!= null并不能把id=2的那一行给取出来，虽然那一行col2的值为‘abcd'，确实不为空。但它是 IS NOT NULL, 不能与NULL值进行等值或不等值的比较。因而无论条件是col2 = null还是col2 != null，相当于都给你弄成false。一行也取不出来。所以，如果不想改变行为的话，就老老实实的使用标准的 IS NOT NULL来过滤。

为了尽量照顾“一些人的情绪”， PG提供了一个开关，让这两者行为保持相同：

 1postgres=# alter system set transform_null_equals = 'on';
 2ALTER SYSTEM
 3
 4postgres=# select pg_reload_conf();
 5 pg_reload_conf
 6----------------
 7 t
 8(1 row)
 9
10postgres=# select * from test4 where col2 != null;
11 id | col2
12----+------
13(0 rows)
14
15postgres=# select * from test4 where col2 = null;
16 id | col2
17----+------
18  1 |
19(1 row)

我们会发现，如果将开关：transform_null_equals 设为'on'， col2 = null与col2 IS null行为一样。但是col2 != null与col2 IS NOT NULL行为不一样。

 1-- 对比一下，你是不是觉得有点儿魔幻？
 2postgres=# select * from test4 where col2 != null;
 3 id | col2
 4----+------
 5(0 rows)
 6
 7postgres=# select * from test4 where not (col2 = null);
 8 id | col2
 9----+------
10  2 | abcd
11(1 row)
12
13postgres=# select * from test4 where col2 is not null;
14 id | col2
15----+------
16  2 | abcd
17(1 row)

3.UNION查询遇到了NULL值

看下边的示例：

 1-- 1.
 2postgres=# select null union select null union select 't';
 3 ?column?
 4----------
 5 t
 6(2 rows)
 7
 8-- 2.
 9postgres=# select null union select null union select 1;
10ERROR:  UNION types text and integer cannot be matched
11LINE 1: select null union select null union select 1;
12
13-- 3.
14postgres=# select null union select 1;
15 ?column?
16----------
17        1
18(2 rows)

我们怎么解释第2个SQL语句执行出错，而第1条和第3条能成功呢？

通过文档，我们发现，PG会将前边的union示为一个嵌套，即：语句1相当于：

1(select null union select null) union select 't'

同时呢，它还把内嵌的union : (select null union select null)的类型默认为text类型，这样它们的类型一致，就能出结果了。这也顺利解释了第2条语句出现类型不匹配的问题。而第3条不存在嵌套，因而是合理的。

4.NULLS FIRST 与 NULLS LAST

这个用法往往容易被人忽略。PG中创建索引时：

1CREATE INDEX index_name ON table_name 
2[USING method]
3(column_name [ASC | DESC] [NULLS FIRST | NULLS LAST]);

其中：

index_name 是索引的名称，table_name 是表的名称；
method 表示索引的类型，例如 btree、hash、gist、spgist、gin 或者 brin。默认为 btree；
column_name 是字段名，ASC
表示升序排序（默认值），DESC
表示降序索引；
NULLS FIRST
和NULLS LAST
表示索引中空值的排列顺序，升序索引时默认为NULLS LAST
，降序索引时默认为NULLS FIRST
。

因而在查询的时候，查询条件必须与NULLS FIRST/LAST保持一致。不一致的例子, 以前边的test4表为例：

 1postgres=# explain analyze select * from test4 order by col2 asc nulls first limit 5;
 2                                                               QUERY PLAN
 3----------------------------------------------------------------------------------------------------------------------------------------
 4 Limit  (cost=5212.53..5213.11 rows=5 width=14) (actual time=18.264..20.069 rows=5 loops=1)
 5   ->  Gather Merge  (cost=5212.53..18741.82 rows=117646 width=14) (actual time=18.262..20.066 rows=5 loops=1)
 6         Workers Planned: 1
 7         Workers Launched: 1
 8         ->  Sort  (cost=4212.52..4506.64 rows=117646 width=14) (actual time=16.071..16.072 rows=5 loops=2)
 9               Sort Key: col2 NULLS FIRST
10               Sort Method: top-N heapsort  Memory: 25kB
11               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
12               ->  Parallel Seq Scan on test4  (cost=0.00..2258.46 rows=117646 width=14) (actual time=0.011..6.766 rows=100001 loops=2)
13 Planning Time: 0.167 ms
14 Execution Time: 20.095 ms
15(11 rows)
16
17postgres=# explain analyze select * from test4 order by col2 asc limit 5;
18                                                              QUERY PLAN
19--------------------------------------------------------------------------------------------------------------------------------------
20 Limit  (cost=0.42..0.73 rows=5 width=14) (actual time=0.021..0.026 rows=5 loops=1)
21   ->  Index Scan using idx_col2_test4 on test4  (cost=0.42..12229.68 rows=199999 width=14) (actual time=0.020..0.024 rows=5 loops=1)
22 Planning Time: 0.123 ms
23 Execution Time: 0.042 ms
24(4 rows)

你会发现，当 order by col2 asc nulls first时，它无法用上索引扫描。因为默认情况下，col2 asc的索引，用的是NULLs LAST。使用的时候切记。

看看IS NULL是否能用上索引？答案是，它也能用上。

1postgres=# explain analyze select * from test4 where col2 is null;
2                                                      QUERY PLAN
3-----------------------------------------------------------------------------------------------------------------------
4 Index Scan using idx_col2_test4 on test4  (cost=0.42..7.82 rows=1 width=14) (actual time=0.016..0.019 rows=3 loops=1)
5   Index Cond: (col2 IS NULL)
6 Planning Time: 0.076 ms
7 Execution Time: 0.033 ms
8(4 rows)

小结：

要想在开发过程中少出错，尽量还是用标准语法吧。尽量用IS NULL, IS NOT NULL，不要用或少用 = NULL, != NULL。然后索引那个地方，留心下就好。

参考:

[1] https://www.postgresql.org/docs/current/queries-order.html

[2] https://www.postgresql.org/docs/current/typeconv-union-case.html

文章转载自数据库杂记，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL中的NULL值使用的几点小结