暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Converting Rows to Columns and Back Again: Databases for Developers

原创 fizz 2022-12-08
280
  • Prerequisite SQL

    create table match_results (
      match_date       date,
      location         varchar2(20),
      home_team_name   varchar2(20),
      away_team_name   varchar2(20),
      home_team_points integer,
      away_team_points integer
    );
    
    insert into match_results values ( date'2018-01-01', 'Snowley', 'Underrated United', 'Terrible Town', 2, 0 );
    insert into match_results values ( date'2018-01-01', 'Coldgate', 'Average Athletic', 'Champions City', 1, 4 );
    insert into match_results values ( date'2018-02-01', 'Dorwall', 'Terrible Town', 'Average Athletic', 0, 1 );
    insert into match_results values ( date'2018-03-01', 'Coldgate', 'Average Athletic', 'Underrated United', 3, 3 );
    insert into match_results values ( date'2018-03-02', 'Newdell', 'Champions City', 'Terrible Town', 8, 0 );
    
    commit;
  • Module1

    Introduction

    This tutorial will show you how to pivot rows to columns and unpivot them back again. It uses this table of match results from a simple league:

    select * from match_results


  • Module2

    Manual Pivot

    You want to see how many games have been played in each location. You can do this with group by:

    select location, count (*)
    from   match_results
    group  by location;


    But you want to show the locations as columns instead of rows. To do this you need to pivot the results.

    You can do this by defining a column for each location. Each must only include rows where the location matches the column name.

    Count only considers non-null values. So you can do this by checking the location inside the count. If this equals the column name, return a constant. Otherwise null.

    You can do this with a case expression like:

    count ( case when location = '<location>' then 1 end ) <location>

    The complete SQL query is:

    select count ( case when location = 'Snowley' then 1 end ) snowley,
           count ( case when location = 'Coldgate' then 1 end ) coldgate, 
           count ( case when location = 'Dorwall' then 1 end ) dorwall,
           count ( case when location = 'Newdell' then 1 end ) newdell
    from   match_results;


  • Module3

    Pivot Clause

    A manual pivot is clunky. It's easy to make mistakes and hard to read when you have many columns.

    The pivot clause, introduced in Oracle Database 11g, allows you to do the same more easily.

    To use it, specify three things:

    1. The values to go in the location columns. You want a count, so this is count(*)
    2. The column containing values you want to become new columns. Here that's location
    3. The list of values to become columns. This is the names of the locations.

    Altogether this gives:

    with rws as (
      select location from match_results
    )
      select * from rws
      pivot (
        count(*) for location in (
          'Snowley', 'Coldgate', 'Dorwall', 'Newdell' 
        )
      );


    Now if you need more locations, add them to the IN list!

  • Module4

    Try it!

    Complete this query to show the locations as columns, with date of the last match played in each:

    Hint: to find the last date in a set, use max


    with rws as (
      select location, match_date from match_results
    )
      select * from rws
      pivot (
        
        for location in (
          'Snowley', 'Coldgate', 'Dorwall', 'Newdell' 
        )
      );


    The output of this query should be:

    'Snowley'     'Coldgate'    'Dorwall'     'Newdell'     
    01-JAN-2018   01-MAR-2018   01-FEB-2018   02-MAR-2018

    Complete this query to show the team names as columns, with the number of home games each has played:

    with rws as (
      select home_team_name from match_results
    )
      select * from rws
      pivot (
        count (*)
        for 
        in (
    
        )
      );


    The team names are:

    • Underrated United
    • Average Athletic
    • Terrible Town
    • Champions City

    And the output of this query should be:

    'Underrated United'   'Average Athletic'   'Terrible Town'   'Champions City'   
                        1                    2                 1                  1 
    
  • Module5

    Implicit Group By

    Any columns from the input table not in the pivot clause form an implicit group by. This can lead to the output having more rows than you expected.

    For example, if you place pivot after the from clause, all the table's columns are inputs to it. So you get one row for every match


    select * from match_results
    pivot (
      count(*) for location in (
        'Snowley', 'Coldgate', 'Dorwall', 'Newdell' 
      )
    );


    To avoid this, use an inline view or CTE. This selects only the columns you'll use in the pivot or want in the final output

  • Module6

    Expressions

    Sometimes you may want to manipulate the values you're pivoting by. For example, you may want to see the number of games played each month, showing the months as columns.

    So you need to convert the match dates to months. You can extract a month's abbreviation from a date using to_char with the format mask 'MON'.

    But if you do this in your pivot clause, you'll get an error:

    with rws as (
      select match_date from match_results
    )
      select * from rws
      pivot (
        count (*) for to_char ( match_date, 'MON' ) in (
          'JAN', 'FEB', 'MAR'
        )
      );


    To fix this, extract the month in an inline view or CTE. Alias the expression and use this alias in the pivot. For example:

    with rws as (
      select to_char ( match_date, 'MON' ) match_month 
      from   match_results
    )
      select * from rws
      pivot (
        count (*) for match_month in (
          'JAN', 'FEB', 'MAR'
        )
      );


    Often you want to change values in the columns you're pivoting on. This and pivot's implicit grouping means it's a good idea to pivot the output of a CTE. This makes it easy to select and manipulate the columns you want.

  • Module7

    Filtering Pivoted Rows

    You can filter the output of pivot using a where clause. This goes after the pivot.

    For example, say you want to show a table of the number of matches played each month in each location. You can do this with:

    with rws as (
      select location, to_char ( match_date, 'MON' ) match_month 
      from   match_results
    )
      select * from rws
      pivot (
        count (*) for match_month in (
          'JAN', 'FEB', 'MAR'
        )
      )


    If you want to restrict this to those locations which had at least one match in January, add the where clause "'JAN'" > 0 at the end:

    with rws as (
      select location, to_char ( match_date, 'MON' ) match_month 
      from   match_results
    )
      select * from rws
      pivot (
        count (*) for match_month in (
          'JAN', 'FEB', 'MAR'
        )
      )
      where  "'JAN'" > 0


  • Module8

    New Column Names

    By default, the names of the new columns are the values from the IN list, wrapped in quotes. This can make your SQL fiddly. To reference the new columns, you must use double AND single quotes.

    To simplify this, alias each value in the list:

    with rws as (
      select location, to_char ( match_date, 'MON' ) match_month 
      from   match_results
    )
      select * from rws
      pivot (
        count (*) for match_month in (
          'JAN' jan, 'FEB' feb, 'MAR' mar
        )
      );


    This makes it easier to filter the values from the previous list:

    with rws as (
      select location, to_char ( match_date, 'MON' ) match_month 
      from   match_results
    )
      select * from rws
      pivot (
        count (*) for match_month in (
          'JAN' jan, 'FEB' feb, 'MAR' mar
        )
      )
      where jan > 0;


  • Module9

    Try It!

    Complete the following query to show:

    • For each location, the number of games played on each day of the week
    • The three letter abbreviation of each day as columns
    • The column headings without quotes
    • Those locations that had one or more games played on Monday
    with rws as (
      select location,
             to_char ( match_date, 'DY' ) match_day
      from   match_results
    )
      select * from rws
      pivot (
        count (*) for match_day in (
          'MON' , 'TUE' , 'WED' , 'THU' ,
          'FRI' , 'SAT' , 'SUN' 
        )
      )
      where  mon
      order  by location 


    to_char ( match_date, 'DY' ) returns the three-letter abbreviation of the day for the date.

    The output of this query should be:

    LOCATION   MON   TUE   WED   THU   FRI   SAT   SUN   
    Coldgate       1     0     0     1     0     0     0 
    Snowley        1     0     0     0     0     0     0
  • Module10

    Pivoting Many Values

    You can pivot many values. For example, say for each month you want columns showing:

    • The number of matches played
    • The total points scored by the home team
    • The total points scored by the away team

    You can do this by adding:

    • count(*)
    • sum ( home_team_points )
    • sum ( away_team_points )

    To the first part of the pivot. Note you must alias all except one of these. The generated column names are the aliases appended to the IN list headings:

    with rws as (
      select location, to_char ( match_date, 'MON' ) match_month ,
             home_team_points, away_team_points
      from   match_results
    )
      select * from rws
      pivot (
        count (*) matches, 
        sum ( home_team_points ) home_points,
        sum ( away_team_points ) away_points
        for match_month in (
          'JAN' jan, 'FEB' feb, 'MAR' mar
        )
      );


    Each new aggregate will have a new column for every value in the IN list. So the total number of new columns is:

    Number of aggregate columns * Number of IN list values
  • Module11

    Try It!

    Complete the following query to show for each location:

    • A column counting the number of games played
    • The total number of points scored at each location ( home_team_points + away_team_points )
    • The count columns should have the matches suffix. The total points scored the suffix points
    with rws as (
      select location, home_team_points, away_team_points
      from   match_results
    )
      select * from rws
      pivot (
    
        for location in (
          'Snowley' snowley, 'Coldgate' coldgate, 
          'Dorwall' dorwall, 'Newdell' newdell
        )
      );


    The output of this query should be:

    SNOWLEY_MATCHES   SNOWLEY_POINTS   COLDGATE_MATCHES   COLDGATE_POINTS   DORWALL_MATCHES   DORWALL_POINTS   NEWDELL_MATCHES   NEWDELL_POINTS   
                    1                2                  2                11                 1                1                 1                8 
    
  • Module12

    Dynamic Pivoting

    Often you'll want to change the columns pivot adds. For example, in many leagues the teams change each season. So every year you'll need to change queries that pivot by location. It'd be handy if these changed dynamically.

    Sadly regular pivot doesn't allow you to do this. The values in the IN list are fixed. You can't use subqueries or variables to change them.

    Luckily you can have a dynamic IN list using XML pivoting. This enables you to pass values from a subquery. Or generate totals for every value in the pivot column using ANY:

    with rws as (
      select location
      from   match_results
    )
      select xmlserialize ( -- formats an XML document
               document location_xml as clob indent size=2
             ) location_xml
      from   rws
      pivot xml (
        count (*) matches for location in ( any )
      );


    But you get the pivoted values as a single XML document! So to extract the values, you need to manipulate the XML.

  • Module13

    Unpivoting

    Unpivoting is the process of taking columns and converting them to rows. For example, you may want to convert the home & away team names to a single team column.

    You can do a DIY unpivot using union all. This will query the source table once for each column you want to become a row.

    For example, to get a row for the home and away teams for each match, you need two queries like so:

    select match_date, location, 'HOME' home_or_away, home_team_name team
    from   match_results
    union  all
    select match_date, location, 'AWAY' home_or_away, away_team_name team
    from   match_results
    order  by match_date, location, home_or_away;


  • Module14

    Unpivot Clause

    Instead you could use the unpivot clause. This was also introduced in Oracle Database 11g and makes unpivoting easier.

    1. A new column storing the pivoted values
    2. Another new column stating the source column of these values
    3. The columns to become rows

    This gives the following statement:

    select match_date, location, home_or_away, team 
    from   match_results
    unpivot (
      team for home_or_away in ( 
        home_team_name as 'HOME', away_team_name as 'AWAY'
      )
    )
    order  by match_date, location, home_or_away;


  • Module15

    Try It!

    Complete the following query to unpivot the home and away points for each match:

    select match_date, location, home_or_away, points 
    from   match_results
    unpivot (
      
      home_or_away in ( 
        home_team_points as 'HOME', away_team_points as 'AWAY'
      )
    )
    order  by match_date, location, home_or_away;


    This should give this output:

    MATCH_DATE    LOCATION   HOME_OR_AWAY   POINTS   
    01-JAN-2018   Coldgate   AWAY                  4 
    01-JAN-2018   Coldgate   HOME                  1 
    01-JAN-2018   Snowley    AWAY                  0 
    01-JAN-2018   Snowley    HOME                  2 
    01-FEB-2018   Dorwall    AWAY                  1 
    01-FEB-2018   Dorwall    HOME                  0 
    01-MAR-2018   Coldgate   AWAY                  3 
    01-MAR-2018   Coldgate   HOME                  3 
    02-MAR-2018   Newdell    AWAY                  0 
    02-MAR-2018   Newdell    HOME                  8
  • Module16

    Combining Pivot and Unpivot

    You can combine pivot and unpivot a single statement. The output from the first becomes the input to the second.

    For example, say you want to produce a league table from the match results. For each team this will show the number of games they've won, drawn, and lost.

    To do this, you need several steps:

    First, compare the home and away team points to find out whether each won, lost or drew.

    Then, to count up the number of outcomes for each team, you need one column with all the names. To do this, you need to unpivot home and away, giving single column of team names.

    Finally, to get the league table, you need to pivot the number of wins, losses, and draws for each team.

    This gives a statement like:

    with rws as (
      select home_team_name, away_team_name, 
             case
               when home_team_points > away_team_points then 'WON'
               when home_team_points < away_team_points then 'LOST'
               else 'DRAW'
             end home_team_result,
             case
               when home_team_points < away_team_points then 'WON'
               when home_team_points > away_team_points then 'LOST'
               else 'DRAW'
             end away_team_result
      from   match_results
    )
      select team, w, d, l 
      from   rws
      unpivot (
        ( team, result ) for home_or_away in ( 
          ( home_team_name, home_team_result ) as 'HOME', 
          ( away_team_name, away_team_result ) as 'AWAY'
        )
      )
      pivot (
        count (*), min ( home_or_away ) dummy
        for result in (
          'WON' W, 'DRAW' D, 'LOST' L
        )
      )
      order  by w desc, d desc, l;
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论