PostgreSQL数据库WHERE和HAVING的区别

原创阎书利云和恩墨技术服务团队 2024-10-12

872

WHERE和HAVING在SQL查询中的作用差异：WHERE用于预筛选行，HAVING则在分组后过滤。
WHERE在计算组和聚合之前选择输入行（因此，它控制哪些行进入聚合计算），而HAVING在计算组和聚合（例如 COUNT、SUM、AVG）之后选择组行。

一、HAVING

HAVING子句指定组的搜索条件。HAVING子句通常与GROUP BY子句一起用于根据指定条件筛选组。

1.HAVING子句必须放置于 GROUP BY子句后面，ORDER BY子句前面。

//表基础数据如下
postgres=# select * from products;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  1 | Product A |       10 |
|  3 | Product C |        8 |
|  2 | Product B |        5 |
+----+-----------+----------+
(3 rows)

//简单的带where条件的
postgres=# select * from products where id>1;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  3 | Product C |        8 |
|  2 | Product B |        5 |
+----+-----------+----------+
(2 rows)
postgres=# explain select * from products where id>1;
+----------------------------------------------------------+
|                        QUERY PLAN                        |
+----------------------------------------------------------+
| Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|   Filter: (id > 1)                                       |
+----------------------------------------------------------+
(2 rows)

//带where，不进行分组，直接使用having的，会报错
postgres=# select * from products where id>1 having id <3;
ERROR:  column "products.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select * from products where id>1 having id <3;
               ^
//带where，进行分组，再使用having的，可以正常执行。
但是因为having后的条件未加聚合计算，所以进行了重写,把(id > 1) AND (id < 3)作为一个filter。
postgres=# select * from products where id>1 group by id having id <3;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  2 | Product B |        5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having id <3;
+----------------------------------------------------------------------+
|                              QUERY PLAN                              |
+----------------------------------------------------------------------+
| Group  (cost=1.05..1.06 rows=1 width=126)                            |
|   Group Key: id                                                      |
|   ->  Sort  (cost=1.05..1.06 rows=1 width=126)                       |
|         Sort Key: id                                                 |
|         ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|               Filter: ((id > 1) AND (id < 3))                        |
+----------------------------------------------------------------------+
(6 rows)

//这个和上边的比较类似，但是在having后，多了order by，order by也只能放在having的前边。
postgres=# select * from products where id>1 group by id having id <3 order by id;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  2 | Product B |        5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having id <3 order by id;
+----------------------------------------------------------------------+
|                              QUERY PLAN                              |
+----------------------------------------------------------------------+
| Group  (cost=1.05..1.06 rows=1 width=126)                            |
|   Group Key: id                                                      |
|   ->  Sort  (cost=1.05..1.06 rows=1 width=126)                       |
|         Sort Key: id                                                 |
|         ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|               Filter: ((id > 1) AND (id < 3))                        |
+----------------------------------------------------------------------+
(6 rows)

//如下部分的having条件里，因为带了聚合函数，这个聚合后的条件判断不能和前边的id>1 重写放到一起，因此，形成了两个filter。
postgres=# select * from products where id>1 group by id having max(id) <3 order by id;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  2 | Product B |        5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having max(id) <3 order by id;
+----------------------------------------------------------------------+
|                              QUERY PLAN                              |
+----------------------------------------------------------------------+
| GroupAggregate  (cost=1.05..1.07 rows=1 width=126)                   |
|   Group Key: id                                                      |
|   Filter: (max(id) < 3)                                              |
|   ->  Sort  (cost=1.05..1.05 rows=1 width=126)                       |
|         Sort Key: id                                                 |
|         ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|               Filter: (id > 1)                                       |
+----------------------------------------------------------------------+
(7 rows)

2.Having子句不能使用列的别名

postgres=# select * from products;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  1 | Product A |       10 |
|  3 | Product C |        8 |
|  2 | Product B |        5 |
+----+-----------+----------+
(3 rows)

postgres=# select count(*) con,quantity from products where id>1 group by quantity having con <3;
ERROR:  column "con" does not exist
LINE 1: ...ty from products where id>1 group by quantity having con <3;
                                                                ^

Having子句不能使用列的别名，原因在于PG执行SQL语句时的解析顺序是如下模式的，解析having子句时，select语句尚未解析，所以con还未被定义出来。

FROM->WHERE->GROUP BY->HAVING->SELECT->DISTINCT->ORDER BY->LIMIT

这种情况不能使用别名，只能再调用聚合函数。

select count(*) con,quantity from products where id>1 group by quantity having con <3;
改成
select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;

改成使用聚合函数后，成功执行，同样的，having的条件加了聚合函数后，不能和where里的条件重写到一个filter。

postgres=# select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
+-----+----------+
| con | quantity |
+-----+----------+
|   1 |        8 |
|   1 |        5 |
+-----+----------+
(2 rows)
postgres=# explain select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
+--------------------------------------------------------------+
|                          QUERY PLAN                          |
+--------------------------------------------------------------+
| HashAggregate  (cost=1.04..1.05 rows=1 width=12)             |
|   Group Key: quantity                                        |
|   Filter: (count(*) < 3)                                     |
|   ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=4) |
|         Filter: (id > 1)                                     |
+--------------------------------------------------------------+
(5 rows)

3.Having子句可以和子查询结合使用，以实现更复杂的筛选条件

                                                               ^
postgres=# select * from products where id>1 group by id having max(id) >(select avg(id) from products);
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  3 | Product C |        8 |
+----+-----------+----------+
(1 row)

postgres=# explain select * from products where id>1 group by id having max(id) >(select avg(id) from products);
+---------------------------------------------------------------------------------+
|                                   QUERY PLAN                                    |
+---------------------------------------------------------------------------------+
| HashAggregate  (cost=2.09..2.11 rows=1 width=126)                               |
|   Group Key: products.id                                                        |
|   Filter: ((max(products.id))::numeric > $0)                                    |
|   InitPlan 1 (returns $0)                                                       |
|     ->  Aggregate  (cost=1.04..1.05 rows=1 width=32)                            |
|           ->  Seq Scan on products products_1  (cost=0.00..1.03 rows=3 width=4) |
|   ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126)                  |
|         Filter: (id > 1)                                                        |
+---------------------------------------------------------------------------------+
(8 rows)

二、WHERE

1.where子句不能使用聚合，类似的功能可以通过子查询实现。

子查询是一个独立的计算，它计算自己的聚合与外部查询中发生的事情分开。

//where子句不能使用聚合

postgres=# select * from t1 where id=max(id);
ERROR:  aggregate functions are not allowed in WHERE
LINE 1: select * from t1 where id=max(id);
                                  ^
//可以使用类似的子查询

postgres=# select * from t1 where id=(select max(id) from t1);
+----+------------+
| id | new_column |
+----+------------+
|  1 |            |
+----+------------+
(1 row)

三、总结

因此，WHERE子句不得包含聚合函数，同时HAVING子句必须放置于 GROUP BY分组子句后面。
大多数情况，HAVING子句总是包含聚合函数，很少使用一个HAVING不使用聚合的子句，这种情况使用相同的条件WHERE效果会更好,因为就算你依旧使用where+不使用聚合的HAVING，如果可能的话，优化器也会尝试重写，把这两个条件放到一个Filter。而且，使用 HAVING 子句可能会影响查询性能，尤其是在处理大数据集时。

墨力计划 postgresql

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

PostgreSQL分享——阎书利（共105篇）

阎书利，云和恩墨数据库技术顾问，专注于数据库领域研究。PG ACE ，中国PG分会认证讲师，ACDU成员，PGFans社区年度MVP及签约作者，ITPUB认证专家。Linux基金会 Research Localization 伙伴。拥有PGCM，MYSQL-OCP-8.0, OBCP，OGCA，Polardb(ACA)，MGCA等认证。openGauss社区oGUG-Ambassador。曾在PG中文社区直播分享《PG逻辑复制槽与故障转移》。《PostgreSQL15版本新特性解读》与《8小时玩转openGauss训练营（第三期）》活动嘉宾。曾参与《深入理解开源项目办公室》《软件材料清单（SBOM）与网络安全准备度》官方中文版翻译。