WHERE和HAVING在SQL查询中的作用差异:WHERE用于预筛选行,HAVING则在分组后过滤。
WHERE在计算组和聚合之前选择输入行(因此,它控制哪些行进入聚合计算),而HAVING在计算组和聚合(例如 COUNT、SUM、AVG)之后选择组行。
一、HAVING
HAVING子句指定组的搜索条件。HAVING子句通常与GROUP BY子句一起用于根据指定条件筛选组。
1.HAVING子句必须放置于 GROUP BY子句后面,ORDER BY子句前面。

//表基础数据如下
postgres=# select * from products;
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 1 | Product A | 10 |
| 3 | Product C | 8 |
| 2 | Product B | 5 |
+----+-----------+----------+
(3 rows)
//简单的带where条件的
postgres=# select * from products where id>1;
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 3 | Product C | 8 |
| 2 | Product B | 5 |
+----+-----------+----------+
(2 rows)
postgres=# explain select * from products where id>1;
+----------------------------------------------------------+
| QUERY PLAN |
+----------------------------------------------------------+
| Seq Scan on products (cost=0.00..1.04 rows=1 width=126) |
| Filter: (id > 1) |
+----------------------------------------------------------+
(2 rows)
//带where,不进行分组,直接使用having的,会报错
postgres=# select * from products where id>1 having id <3;
ERROR: column "products.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select * from products where id>1 having id <3;
^
//带where,进行分组,再使用having的,可以正常执行。
但是因为having后的条件未加聚合计算,所以进行了重写,把(id > 1) AND (id < 3)作为一个filter。
postgres=# select * from products where id>1 group by id having id <3;
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 2 | Product B | 5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having id <3;
+----------------------------------------------------------------------+
| QUERY PLAN |
+----------------------------------------------------------------------+
| Group (cost=1.05..1.06 rows=1 width=126) |
| Group Key: id |
| -> Sort (cost=1.05..1.06 rows=1 width=126) |
| Sort Key: id |
| -> Seq Scan on products (cost=0.00..1.04 rows=1 width=126) |
| Filter: ((id > 1) AND (id < 3)) |
+----------------------------------------------------------------------+
(6 rows)
//这个和上边的比较类似,但是在having后,多了order by,order by也只能放在having的前边。
postgres=# select * from products where id>1 group by id having id <3 order by id;
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 2 | Product B | 5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having id <3 order by id;
+----------------------------------------------------------------------+
| QUERY PLAN |
+----------------------------------------------------------------------+
| Group (cost=1.05..1.06 rows=1 width=126) |
| Group Key: id |
| -> Sort (cost=1.05..1.06 rows=1 width=126) |
| Sort Key: id |
| -> Seq Scan on products (cost=0.00..1.04 rows=1 width=126) |
| Filter: ((id > 1) AND (id < 3)) |
+----------------------------------------------------------------------+
(6 rows)
//如下部分的having条件里,因为带了聚合函数,这个聚合后的条件判断不能和前边的id>1 重写放到一起,因此,形成了两个filter。
postgres=# select * from products where id>1 group by id having max(id) <3 order by id;
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 2 | Product B | 5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having max(id) <3 order by id;
+----------------------------------------------------------------------+
| QUERY PLAN |
+----------------------------------------------------------------------+
| GroupAggregate (cost=1.05..1.07 rows=1 width=126) |
| Group Key: id |
| Filter: (max(id) < 3) |
| -> Sort (cost=1.05..1.05 rows=1 width=126) |
| Sort Key: id |
| -> Seq Scan on products (cost=0.00..1.04 rows=1 width=126) |
| Filter: (id > 1) |
+----------------------------------------------------------------------+
(7 rows)
2.Having子句不能使用列的别名
postgres=# select * from products;
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 1 | Product A | 10 |
| 3 | Product C | 8 |
| 2 | Product B | 5 |
+----+-----------+----------+
(3 rows)
postgres=# select count(*) con,quantity from products where id>1 group by quantity having con <3;
ERROR: column "con" does not exist
LINE 1: ...ty from products where id>1 group by quantity having con <3;
^
Having子句不能使用列的别名,原因在于PG执行SQL语句时的解析顺序是如下模式的,解析having子句时,select语句尚未解析,所以con还未被定义出来。
FROM->WHERE->GROUP BY->HAVING->SELECT->DISTINCT->ORDER BY->LIMIT
这种情况不能使用别名,只能再调用聚合函数。
select count(*) con,quantity from products where id>1 group by quantity having con <3;
改成
select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
改成使用聚合函数后,成功执行,同样的,having的条件加了聚合函数后,不能和where里的条件重写到一个filter。
postgres=# select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
+-----+----------+
| con | quantity |
+-----+----------+
| 1 | 8 |
| 1 | 5 |
+-----+----------+
(2 rows)
postgres=# explain select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
+--------------------------------------------------------------+
| QUERY PLAN |
+--------------------------------------------------------------+
| HashAggregate (cost=1.04..1.05 rows=1 width=12) |
| Group Key: quantity |
| Filter: (count(*) < 3) |
| -> Seq Scan on products (cost=0.00..1.04 rows=1 width=4) |
| Filter: (id > 1) |
+--------------------------------------------------------------+
(5 rows)
3.Having子句可以和子查询结合使用,以实现更复杂的筛选条件
^
postgres=# select * from products where id>1 group by id having max(id) >(select avg(id) from products);
+----+-----------+----------+
| id | name | quantity |
+----+-----------+----------+
| 3 | Product C | 8 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having max(id) >(select avg(id) from products);
+---------------------------------------------------------------------------------+
| QUERY PLAN |
+---------------------------------------------------------------------------------+
| HashAggregate (cost=2.09..2.11 rows=1 width=126) |
| Group Key: products.id |
| Filter: ((max(products.id))::numeric > $0) |
| InitPlan 1 (returns $0) |
| -> Aggregate (cost=1.04..1.05 rows=1 width=32) |
| -> Seq Scan on products products_1 (cost=0.00..1.03 rows=3 width=4) |
| -> Seq Scan on products (cost=0.00..1.04 rows=1 width=126) |
| Filter: (id > 1) |
+---------------------------------------------------------------------------------+
(8 rows)
二、WHERE
1.where子句不能使用聚合,类似的功能可以通过子查询实现。
子查询是一个独立的计算,它计算自己的聚合与外部查询中发生的事情分开。
//where子句不能使用聚合
postgres=# select * from t1 where id=max(id);
ERROR: aggregate functions are not allowed in WHERE
LINE 1: select * from t1 where id=max(id);
^
//可以使用类似的子查询
postgres=# select * from t1 where id=(select max(id) from t1);
+----+------------+
| id | new_column |
+----+------------+
| 1 | |
+----+------------+
(1 row)
三、总结
因此,WHERE子句不得包含聚合函数,同时HAVING子句必须放置于 GROUP BY分组子句后面。
大多数情况,HAVING子句总是包含聚合函数,很少使用一个HAVING不使用聚合的子句,这种情况使用相同的条件WHERE效果会更好,因为就算你依旧使用where+不使用聚合的HAVING,如果可能的话,优化器也会尝试重写,把这两个条件放到一个Filter。而且,使用 HAVING 子句可能会影响查询性能,尤其是在处理大数据集时。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




