浏览公众号上的文章,看到有篇文章写的挺好,结合自己遇到的问题整合下。《SQL面试题:WHERE和HAVING、ON有什么区别?》公众号文章地址:https://mp.weixin.qq.com/s/M_POMqBXao4zbV6z051kiQ
1、WHERE与HAVING的根本区别在于:
WHERE子句在GROUP BY分组和聚合函数之前对数据行进行过滤;
HAVING子句对GROUP BY分组和聚合函数之后的数据行进行过滤。
因此,WHERE子句中不能使用聚合函数,正确的方法是使用HAVING对聚合之后的结果进行过滤,另一方面,HAVING子句中不能使用除了分组字段和聚合函数之外的其他字段。
从性能的角度来说,HAVING子句中如果使用了分组字段作为过滤条件,应该替换成WHERE子句;因为WHERE可以在执行分组操作和计算聚合函数之前过滤掉不需要的数据,性能会更好。
2、WHERE 与 ON
当查询涉及多个表的关联时,我们既可以使用`WHERE`子句也可以使用`ON`子句指定连接条件和过滤条件。这两者之间的主要区别在于:
对于内连接(inner join)查询,WHERE和ON中的过滤条件等效;
对于外连接(outer join)查询,ON中的过滤条件在连接操作之前执行(我觉得是连接时执行,就是b表展示字段时,加上该过滤条件),WHERE中的过滤条件(逻辑上)在连接操作之后执行。
对于外连接时,可以用两个例子展示相关:
--结果返回1行--先执行 select * from emp a where a.id=13 选出 主表需要关联的行,再跟B表关联出结果--等价于 select * from(select * from emp a where a.id=13) a left join dept b on a.deptno=b.deptno
select * from emp a left join dept b on a.deptno=b.deptno where a.id=13

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 68 | 3 | 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 1 | 68 | 3 | 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID | EMP | 1 | 42 | 2 | 00:00:01 |
| * 3 | INDEX RANGE SCAN | EMP_ID | 1 | | 1 | 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 26 | 1 | 00:00:01 |
| * 5 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 | 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 3 - access("A"."ID"=13)
* 5 - access("A"."DEPTNO"="B"."DEPTNO"(+))
--结果返回15行,b表字段没有值,其过程是当 a.deptno=b.deptno 且 a.id=13 ,所以结果是15行 --左连接显示emp表的所有行,且表关联的结果需要显示b表的行时,需要满足 a.deptno=b.deptno 且 a.id=13
select * from emp a left join dept b on a.deptno=b.deptno and a.id=13

----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 1020 | 6 | 00:00:01 |
| * 1 | HASH JOIN OUTER | | 15 | 1020 | 6 | 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 15 | 630 | 3 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEPT | 4 | 104 | 3 | 00:00:01 |
----------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("A"."DEPTNO"="B"."DEPTNO"(+) AND "A"."ID"=CASE WHEN ("B"."DEPTNO"(+) IS NOT NULL) THEN 13 ELSE 13 END )
通过上面展示的两个例子可以看到其执行计划跟结果都是有区别的,所以在改写的时候需要注意表关联的条件中and的条件,建议在表连接时,and这样的过滤结合业务场景能放在where后就放where后,直观且方便理解。
3、where与On 带and的拓展:
3.1、b表的条件作为过滤条件作为临时表再进行表关联的方式比较,经测试以下两种方式的结果一致,且执行计划一致。sql如下:
select * from emp a left join dept b on a.deptno=b.deptno and b.id=3;select * from emp a left join (select * from dept b where b.id=3) b on a.deptno=b.deptno;----而把b.id=3放在where后,就不一样了,相当于两表关联后再得出的结果集再进行b.id=3过滤了,结果集会少----select * from emp a left join dept b on a.deptno=b.deptno where b.id=3;
3.2、a表的条件作为过滤条件作为临时表再进行表关联的方式比较
select * from emp a left join dept b on a.deptno=b.deptno and a.id=13; --展示15行结果是展示a表的所有行(或者是a表所有行加上满足a.deptno=b.deptno and a.id=13的其他行),本例是15行
select * from (select * from emp a where a.id=13) a left join dept b on a.deptno=b.deptno ; --展示1行结果是展示(select * from emp a where a.id=13)结果集与b表关联满足a.deptno=b.deptno的行,返回结果1行
以上在主表变动的情况下,结果集会不一样,且执行计划不一样。所以在sql改写时一定需要注意结果集。
4、where与On 带or的拓展:
看下下面的sql,该sql理解为a.deptno=b.deptno or a.id=3,当两表满足 a.deptno=b.deptno @1 或者 a.id=3时a表与b表关联 @2 ,并且@1存在的行@2结果集存在的话,需要去掉结果集@2重复的该行
select a.empno,a.deptno,b.deptno,a.id,b.id from emp a left join dept b on a.deptno=b.deptno or a.id=3; ---结果展示18行-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 540 | 48 | 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 15 | 540 | 48 | 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 15 | 150 | 3 | 00:00:01 |
| 3 | VIEW | | 1 | 26 | 3 | 00:00:01 |
| * 4 | TABLE ACCESS FULL | DEPT | 1 | 16 | 3 | 00:00:01 |
-----------------------------------------------------------------------
尝试改写常规or改写利用union all lnnvl()函数,此处有报错,告警日志竟然有ora 07445报错,我丢,执行个sql都能这样报错,断开与数据库连接,暂时不管这个报错;提下为啥会这样改写,因为根据select * from a where m =:1 or n =:2;这类sql该写的方式是
select * from a where m =:1
union all
select * from a where lnnvl(m =:1) and n =:2;ORA-07445: 出现异常错误: 核心转储 [kkqtutlGetFirstCol()+11] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFFFFFFFFFF] [PC:0x1443CF6E5] [UNABLE_TO_READ] []
select a.empno,a.deptno,b.deptno,a.id,b.id from emp a left join dept b on a.deptno=b.deptno
union all
select a.empno,a.deptno,b.deptno,a.id,b.id from emp a left join dept b on lnnvl(a.deptno=b.deptno) and a.id=3 ;其他尝试写法,核对该中写法结果集一致。此处不需要考虑deptno空值情况
select a.empno,a.deptno,b.deptno,a.id,b.id from emp a left join dept b on a.deptno=b.deptno
union all
select a.empno,a.deptno,b.deptno,a.id,b.id from emp a left join dept b on a.deptno!=b.deptno where a.id=3 ;5、left join on 后面带子查询(此例参考老虎刘老师公益课)
select count(*)
from t1
left join t2
on t1.object_id = t2.object_id
and t2.owner in (select username from t_users)------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 484069 | 01:36:49 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | NESTED LOOPS OUTER | | 1107012 | 14391156 | 484069 | 01:36:49 |
| 3 | TABLE ACCESS FULL | T1 | 11182 | 145366 | 40 | 00:00:01 |
| 4 | VIEW | | 99 | | 43 | 00:00:01 |
| * 5 | HASH JOIN SEMI | | 99 | 4653 | 43 | 00:00:01 |
| * 6 | TABLE ACCESS FULL | T2 | 99 | 2970 | 40 | 00:00:01 |
| 7 | TABLE ACCESS FULL | T_USERS | 34 | 578 | 3 | 00:00:01 |
------------------------------------------------------------------------------------
在12C之前建议改写
select count(*)
from t1
left join (select *
from t2
where t2.owner in (select username from t_users)) v_t2
on t1.object_id = v_t2.object_id;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 84 | 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
| * 2 | HASH JOIN RIGHT OUTER | | 11182 | 290732 | 84 | 00:00:02 |
| 3 | VIEW | | 9929 | 129077 | 43 | 00:00:01 |
| * 4 | HASH JOIN RIGHT SEMI | | 9929 | 466663 | 43 | 00:00:01 |
| 5 | TABLE ACCESS FULL | T_USERS | 34 | 578 | 3 | 00:00:01 |
| 6 | TABLE ACCESS FULL | T2 | 9929 | 297870 | 40 | 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 11182 | 145366 | 40 | 00:00:01 |
--------------------------------------------------------------------------------
6、总结
综合从上述例子来看,对于where on and or 类还是需要认真验证结果是否符合业务逻辑,在sql改写时需要理解查询逻辑,验证好查询结果,
避免特殊情况下的不一样的结果集。在工作中还遇到过其他情况下,比如 or 带子查询的产生filter的情况,针对此类下次继续总结




