暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

PostgreSQL数据库WHERE和HAVING的区别

WHERE和HAVING在SQL查询中的作用差异:WHERE用于预筛选行,HAVING则在分组后过滤。
WHERE在计算组和聚合之前选择输入行(因此,它控制哪些行进入聚合计算),而HAVING在计算组和聚合(例如 COUNT、SUM、AVG)之后选择组行。

一、HAVING

HAVING子句指定组的搜索条件。HAVING子句通常与GROUP BY子句一起用于根据指定条件筛选组。

1.HAVING子句必须放置于 GROUP BY子句后面,ORDER BY子句前面。

image.png

//表基础数据如下
postgres=# select * from products;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  1 | Product A |       10 |
|  3 | Product C |        8 |
|  2 | Product B |        5 |
+----+-----------+----------+
(3 rows)

//简单的带where条件的
postgres=# select * from products where id>1;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  3 | Product C |        8 |
|  2 | Product B |        5 |
+----+-----------+----------+
(2 rows)
postgres=# explain select * from products where id>1;
+----------------------------------------------------------+
|                        QUERY PLAN                        |
+----------------------------------------------------------+
| Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|   Filter: (id > 1)                                       |
+----------------------------------------------------------+
(2 rows)

//带where,不进行分组,直接使用having的,会报错
postgres=# select * from products where id>1 having id <3;
ERROR:  column "products.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select * from products where id>1 having id <3;
               ^
//带where,进行分组,再使用having的,可以正常执行。
但是因为having后的条件未加聚合计算,所以进行了重写,把(id > 1) AND (id < 3)作为一个filter。
postgres=# select * from products where id>1 group by id having id <3;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  2 | Product B |        5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having id <3;
+----------------------------------------------------------------------+
|                              QUERY PLAN                              |
+----------------------------------------------------------------------+
| Group  (cost=1.05..1.06 rows=1 width=126)                            |
|   Group Key: id                                                      |
|   ->  Sort  (cost=1.05..1.06 rows=1 width=126)                       |
|         Sort Key: id                                                 |
|         ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|               Filter: ((id > 1) AND (id < 3))                        |
+----------------------------------------------------------------------+
(6 rows)

//这个和上边的比较类似,但是在having后,多了order by,order by也只能放在having的前边。
postgres=# select * from products where id>1 group by id having id <3 order by id;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  2 | Product B |        5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having id <3 order by id;
+----------------------------------------------------------------------+
|                              QUERY PLAN                              |
+----------------------------------------------------------------------+
| Group  (cost=1.05..1.06 rows=1 width=126)                            |
|   Group Key: id                                                      |
|   ->  Sort  (cost=1.05..1.06 rows=1 width=126)                       |
|         Sort Key: id                                                 |
|         ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|               Filter: ((id > 1) AND (id < 3))                        |
+----------------------------------------------------------------------+
(6 rows)

//如下部分的having条件里,因为带了聚合函数,这个聚合后的条件判断不能和前边的id>1 重写放到一起,因此,形成了两个filter。
postgres=# select * from products where id>1 group by id having max(id) <3 order by id;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  2 | Product B |        5 |
+----+-----------+----------+
(1 row)
postgres=# explain select * from products where id>1 group by id having max(id) <3 order by id;
+----------------------------------------------------------------------+
|                              QUERY PLAN                              |
+----------------------------------------------------------------------+
| GroupAggregate  (cost=1.05..1.07 rows=1 width=126)                   |
|   Group Key: id                                                      |
|   Filter: (max(id) < 3)                                              |
|   ->  Sort  (cost=1.05..1.05 rows=1 width=126)                       |
|         Sort Key: id                                                 |
|         ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126) |
|               Filter: (id > 1)                                       |
+----------------------------------------------------------------------+
(7 rows)

2.Having子句不能使用列的别名

postgres=# select * from products;
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  1 | Product A |       10 |
|  3 | Product C |        8 |
|  2 | Product B |        5 |
+----+-----------+----------+
(3 rows)

postgres=# select count(*) con,quantity from products where id>1 group by quantity having con <3;
ERROR:  column "con" does not exist
LINE 1: ...ty from products where id>1 group by quantity having con <3;
                                                                ^

Having子句不能使用列的别名,原因在于PG执行SQL语句时的解析顺序是如下模式的,解析having子句时,select语句尚未解析,所以con还未被定义出来。

FROM->WHERE->GROUP BY->HAVING->SELECT->DISTINCT->ORDER BY->LIMIT 

这种情况不能使用别名,只能再调用聚合函数。

select count(*) con,quantity from products where id>1 group by quantity having con <3;
改成
select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;

改成使用聚合函数后,成功执行,同样的,having的条件加了聚合函数后,不能和where里的条件重写到一个filter。

postgres=# select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
+-----+----------+
| con | quantity |
+-----+----------+
|   1 |        8 |
|   1 |        5 |
+-----+----------+
(2 rows)
postgres=# explain select count(*) con,quantity from products where id>1 group by quantity having count(*) <3;
+--------------------------------------------------------------+
|                          QUERY PLAN                          |
+--------------------------------------------------------------+
| HashAggregate  (cost=1.04..1.05 rows=1 width=12)             |
|   Group Key: quantity                                        |
|   Filter: (count(*) < 3)                                     |
|   ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=4) |
|         Filter: (id > 1)                                     |
+--------------------------------------------------------------+
(5 rows)

3.Having子句可以和子查询结合使用,以实现更复杂的筛选条件

                                                               ^
postgres=# select * from products where id>1 group by id having max(id) >(select avg(id) from products);
+----+-----------+----------+
| id |   name    | quantity |
+----+-----------+----------+
|  3 | Product C |        8 |
+----+-----------+----------+
(1 row)

postgres=# explain select * from products where id>1 group by id having max(id) >(select avg(id) from products);
+---------------------------------------------------------------------------------+
|                                   QUERY PLAN                                    |
+---------------------------------------------------------------------------------+
| HashAggregate  (cost=2.09..2.11 rows=1 width=126)                               |
|   Group Key: products.id                                                        |
|   Filter: ((max(products.id))::numeric > $0)                                    |
|   InitPlan 1 (returns $0)                                                       |
|     ->  Aggregate  (cost=1.04..1.05 rows=1 width=32)                            |
|           ->  Seq Scan on products products_1  (cost=0.00..1.03 rows=3 width=4) |
|   ->  Seq Scan on products  (cost=0.00..1.04 rows=1 width=126)                  |
|         Filter: (id > 1)                                                        |
+---------------------------------------------------------------------------------+
(8 rows)

二、WHERE

1.where子句不能使用聚合,类似的功能可以通过子查询实现。

子查询是一个独立的计算,它计算自己的聚合与外部查询中发生的事情分开。

//where子句不能使用聚合

postgres=# select * from t1 where id=max(id);
ERROR:  aggregate functions are not allowed in WHERE
LINE 1: select * from t1 where id=max(id);
                                  ^
//可以使用类似的子查询

postgres=# select * from t1 where id=(select max(id) from t1);
+----+------------+
| id | new_column |
+----+------------+
|  1 |            |
+----+------------+
(1 row)

三、总结

因此,WHERE子句不得包含聚合函数,同时HAVING子句必须放置于 GROUP BY分组子句后面。
大多数情况,HAVING子句总是包含聚合函数,很少使用一个HAVING不使用聚合的子句,这种情况使用相同的条件WHERE效果会更好,因为就算你依旧使用where+不使用聚合的HAVING,如果可能的话,优化器也会尝试重写,把这两个条件放到一个Filter。而且,使用 HAVING 子句可能会影响查询性能,尤其是在处理大数据集时。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论