暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Union, Minus, and Intersect: Databases for Developers

原创 fizz 2022-12-06
243
  • Prerequisite SQL

    create table my_brick_collection (
      colour varchar2(10),
      shape  varchar2(10),
      weight integer
    );
    
    create table your_brick_collection (
      height integer,
      width  integer,
      depth  integer,
      colour varchar2(10),
      shape  varchar2(10)
    );
    
    insert into my_brick_collection values ( 'red', 'cube', 10 );
    insert into my_brick_collection values ( 'blue', 'cuboid', 8 );
    insert into my_brick_collection values ( 'green', 'pyramid', 20 );
    insert into my_brick_collection values ( 'green', 'pyramid', 20 );
    insert into my_brick_collection values ( null, 'cuboid', 20 );
    
    insert into your_brick_collection values ( 2, 2, 2, 'red', 'cube' );
    insert into your_brick_collection values ( 2, 2, 2, 'blue', 'cube' );
    insert into your_brick_collection values ( 2, 2, 8, null, 'cuboid' );
    
    commit;
  • Module1

    Introduction

    The set operators union, intersect, and minus allow you to combine many tables into one. This tutorial will use these two tables for the queries:

    select * from my_brick_collection;
    
    select * from your_brick_collection;


  • Module2

    Union

    The union operator combines two or more tables into a single result set. To use it, the tables must have the same number of columns with matching data types in each position.

    The brick collection tables have different columns. So combining these with select * leads to an error:

    select * from my_brick_collection
    union 
    select * from your_brick_collection;


    To resolve this, select the common columns. Here that's the colour and shape. So this query returns a list of all the ( colour, shape ) values in the tables:

    select colour, shape from my_brick_collection
    union 
    select colour, shape from your_brick_collection;


  • Module3

    Distinct

    There are two red cube rows in the tables, one in each table. But union only displays one!

    This is because union applies the distinct operator to your results. This discards duplicate rows. So your results only have one row for each set of column values.

    You can use distinct by placing it after select and before the column list. This squashes out duplicates. So you only get one row for each set of values in your columns.

    For example, there are two green pyramid rows in my collection. If you do a distinct *, you only get one copy of this brick:

    select distinct * from my_brick_collection;


    You can also use distinct on a subset of a table's columns. For example, to get one row for each shape in your collection, select "distinct shape":

    select distinct shape from your_brick_collection;


  • Module4

    Union All

    Often when combining tables you want to see all the rows. Including duplicates. Not the list of distinct values.

    You can do this by sticking all after union:

    select colour, shape from my_brick_collection
    union all 
    select colour, shape from your_brick_collection;


    A standard union is the same as the following:

    select distinct * from (
      select colour, shape from my_brick_collection
      union all 
      select colour, shape from your_brick_collection
    );


    The distinct operator adds an extra sorting step to your SQL. So in most cases you'll want to use union all. Save plain union for when you know you want to remove duplicate rows.

  • Module5

    Try It!

    Complete this query to return a list of all the colours in the two tables. Each colour must only appear once:

    select colour from my_brick_collection
    /*TODO*/
    select colour from your_brick_collection
    order by colour;
    


    The output of this query should be:

    COLOUR   
    blue     
    green    
    red      
    <null>

    Complete the following query to return a list of all the shapes in both tables. There must show one row for each row in the source tables:

    select shape from my_brick_collection
    /*TODO*/
    select shape from your_brick_collection
    order  by shape;
    


    This query should return the following rows:

    SHAPE       
    cube      
    cube      
    cube      
    cuboid    
    cuboid    
    cuboid    
    pyramid   
    pyramid 
  • Module6

    Set Difference

    The set difference operation returns all the rows in one table not in another. You can do this with not exists. For example:

    select colour, shape from your_brick_collection ybc
    where  not exists (
      select null from my_brick_collection mbc
      where  ybc.colour = mbc.colour
      and    ybc.shape = mbc.shape
    );
    


    If you're comparing many columns this is a lot of typing. And tricky to understand later.

    But there's a bigger issue. It handles nulls incorrectly! Because null = null => unknown, it returns rows where shape or colour is null. So the null-coloured cuboid is in the output. Even though this row exists in both tables!

    To fix this, test if the columns are equal or both are null:

    select colour, shape from your_brick_collection ybc
    where  not exists (
      select null from my_brick_collection mbc
      where  ( ybc.colour = mbc.colour or 
               ( ybc.colour is null and mbc.colour is null ) 
             )
      and    ( ybc.shape = mbc.shape or
               ( ybc.shape is null and mbc.shape is null ) 
             )
    );


  • Module7

    Minus

    Oracle Database includes an operator that implements set difference: minus

    This is easier to use than not exists. All you need to do is select the relevant columns from each table with minus between them. And set operators are one of the rare cases where the database considers null values to be equal.

    So the following query returns the same rows as not exists with null checking:

    select colour, shape from your_brick_collection
    minus
    select colour, shape from my_brick_collection;


    But there is another subtle difference. Like union, minus applies a distinct to the output. There are two green pyramids in my collection not in yours. But minus only returns one of these:

    select colour, shape from my_brick_collection
    minus
    select colour, shape from your_brick_collection


    To see both, you need to use not exists:

    select colour, shape from my_brick_collection mbc
    where  not exists (
      select null from your_brick_collection ybc
      where  ( ybc.colour = mbc.colour or ( ybc.colour is null and mbc.colour is null ) )
      and    ybc.shape = mbc.shape
    );

    Oracle Database 21c added the all option for minus. It also included except as a synonym for minus.

  • Module8

    Finding Common Values

    You can find values that are in both table with exists. As with not exists, do to this correctly you need to test for null in the subquery:

    select colour, shape from your_brick_collection ybc
    where  exists (
      select null from my_brick_collection mbc
      where  ( ybc.colour = mbc.colour or ( ybc.colour is null and mbc.colour is null ) )
      and    ybc.shape = mbc.shape
    );
    


  • Module9

    Intersect

    There is also an operator to find the common values: intersect

    You use this in the same way as union and minus: place it between a select from each table:

    select colour, shape from your_brick_collection
    intersect
    select colour, shape from my_brick_collection;


    As with minus, the database considers null values to be the same and applies a distinct operator to the results. Oracle Database 21c added the all clause to intersect as well as minus.

  • Module10

    Try It!

    Complete the following query to return a list of all the shapes in my collection not in yours:

    select shape from my_brick_collection
    /*TODO*/
    select shape from your_brick_collection;


    This should return the following row:

    SHAPE     
    pyramid  

    Complete the following query to return a list of all the colours that are in both tables:

    select colour from my_brick_collection
    /*TODO*/
    select colour from your_brick_collection
    order  by colour;


    This should return the following rows:

    COLOUR   
    blue     
    red      
    <null>   
  • Module11

    Finding the Difference Between Two Tables (Symmetric Difference)

    You can combine set operators to implement a classic use case:

    Comparing two tables, returning a list of all the values that only exist in one table.

    This is also known as the symmetric difference. There isn't a native operator that does this. But you can do it by:

    • Finding the rows in table one not in table two with minus
    • Finding the rows in table two not in table one with minus
    • Combining the output of these two operations with union (all)

    My collection has a blue cuboid and two green pyramids not in yours. And you have a blue cube I don't. So these should appear in the output. But when you chain the operators as described above:

    select colour, shape from your_brick_collection
    minus
    select colour, shape from my_brick_collection
    union all
    select colour, shape from my_brick_collection
    minus
    select colour, shape from your_brick_collection;


    You only get blue cuboid and green pyramid!

    This is because the set operators all have the same priority in Oracle Database. To fix this and do the minuses before union, you need parentheses. Place them around the operations that should happen first:

    select * from ( 
      select colour, shape from your_brick_collection
      minus
      select colour, shape from my_brick_collection
    ) union all (
      select colour, shape from my_brick_collection
      minus
      select colour, shape from your_brick_collection
    );


    Or, you could use the following method:

    • Combine all the rows from the two tables with union (all)
    • Find the values that exist in both tables with intersect
    • Minus the second query from the first

    As before, you need brackets to ensure correct order of processing. Which gives:

    select * from ( 
      select colour, shape from your_brick_collection
      union all
      select colour, shape from my_brick_collection
    ) minus (
      select colour, shape from my_brick_collection
      intersect
      select colour, shape from your_brick_collection
    );


  • Module12

    Symmetric Difference with Group By

    There are a couple of drawbacks to both the previous methods. Firstly you have to read all the rows from both tables twice. Secondly minus and intersect return distinct values. So you only see the values only in one table. Not all the rows.

    So if you get an extra red cube, you have one more than me. But the queries don't show this:

    insert into your_brick_collection values ( 4, 4, 4, 'red', 'cube' );
    
    select * from ( 
      select colour, shape from your_brick_collection
      minus
      select colour, shape from my_brick_collection
    ) union all (
      select colour, shape from my_brick_collection
      minus
      select colour, shape from your_brick_collection
    );


    Luckily there's an alternative which solves both of these issues.

    Check whether each table has the same number of rows for each set of values.

    You can do this by union alling the two tables together with a couple of extra columns. One to count the rows from the first table, the other for the second. By returning the values 1 or 0 you can sum these up to get the count.

    To see the different rows, return those where these sums are not equal in the having clause.

    This gives the following query:

    select colour, shape, sum ( your_bricks ), sum ( my_bricks ) 
    from (
      select colour, shape, 1 your_bricks, 0 my_bricks
      from   your_brick_collection
      union all
      select colour, shape, 0 your_bricks, 1 my_bricks 
      from   my_brick_collection
    )
    group  by colour, shape
    having sum ( your_bricks ) <> sum ( my_bricks );


    Using this method you only read the rows in each table once. So this can be notably faster than the classic way. And you can see how many rows have each set of values in each table.

    This leads to another benefit of this method: it enables you to see which table has more rows. You can do this by comparing the sum of bricks for each colour and shape. Then return the name of the table that has more. You can also get the number of extra rows. Do this by finding the absolute value of the difference between these two sums.

    For example:

    select colour, shape, 
           case
             when sum ( your_bricks ) < sum ( my_bricks ) then 'ME'
             when sum ( your_bricks ) > sum ( my_bricks ) then 'YOU'
             else 'EQUAL'
           end who_has_extra,
           abs ( sum ( your_bricks ) - sum ( my_bricks ) ) how_many
    from (
      select colour, shape, 1 your_bricks, 0 my_bricks
      from   your_brick_collection
      union all
      select colour, shape, 0 your_bricks, 1 my_bricks 
      from   my_brick_collection
    )
    group  by colour, shape;


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论