Converting Rows to Columns and Back Again: Databases for Developers

原创 fizz 2022-12-08

280

Prerequisite SQL

create table match_results (
  match_date       date,
  location         varchar2(20),
  home_team_name   varchar2(20),
  away_team_name   varchar2(20),
  home_team_points integer,
  away_team_points integer
);

insert into match_results values ( date'2018-01-01', 'Snowley', 'Underrated United', 'Terrible Town', 2, 0 );
insert into match_results values ( date'2018-01-01', 'Coldgate', 'Average Athletic', 'Champions City', 1, 4 );
insert into match_results values ( date'2018-02-01', 'Dorwall', 'Terrible Town', 'Average Athletic', 0, 1 );
insert into match_results values ( date'2018-03-01', 'Coldgate', 'Average Athletic', 'Underrated United', 3, 3 );
insert into match_results values ( date'2018-03-02', 'Newdell', 'Champions City', 'Terrible Town', 8, 0 );

commit;

Module1
Introduction
This tutorial will show you how to pivot rows to columns and unpivot them back again. It uses this table of match results from a simple league:
```
select * from match_results
```
Module2
Manual Pivot
You want to see how many games have been played in each location. You can do this with group by:
```
select location, count (*)
from   match_results
group  by location;
```
But you want to show the locations as columns instead of rows. To do this you need to pivot the results.
You can do this by defining a column for each location. Each must only include rows where the location matches the column name.
Count only considers non-null values. So you can do this by checking the location inside the count. If this equals the column name, return a constant. Otherwise null.
You can do this with a case expression like:
```
count ( case when location = '<location>' then 1 end ) <location>
```
The complete SQL query is:
```
select count ( case when location = 'Snowley' then 1 end ) snowley,
       count ( case when location = 'Coldgate' then 1 end ) coldgate, 
       count ( case when location = 'Dorwall' then 1 end ) dorwall,
       count ( case when location = 'Newdell' then 1 end ) newdell
from   match_results;
```
Module3
Pivot Clause
A manual pivot is clunky. It's easy to make mistakes and hard to read when you have many columns.
The pivot clause, introduced in Oracle Database 11g, allows you to do the same more easily.
To use it, specify three things:
1. The values to go in the location columns. You want a count, so this is count(*)
2. The column containing values you want to become new columns. Here that's location
3. The list of values to become columns. This is the names of the locations.
Altogether this gives:
```
with rws as (
  select location from match_results
)
  select * from rws
  pivot (
    count(*) for location in (
      'Snowley', 'Coldgate', 'Dorwall', 'Newdell' 
    )
  );
```
Now if you need more locations, add them to the IN list!

Module4

Try it!

Complete this query to show the locations as columns, with date of the last match played in each:

Hint: to find the last date in a set, use max

with rws as (
  select location, match_date from match_results
)
  select * from rws
  pivot (
    
    for location in (
      'Snowley', 'Coldgate', 'Dorwall', 'Newdell' 
    )
  );

The output of this query should be:

'Snowley'     'Coldgate'    'Dorwall'     'Newdell'     
01-JAN-2018   01-MAR-2018   01-FEB-2018   02-MAR-2018

Complete this query to show the team names as columns, with the number of home games each has played:

with rws as (
  select home_team_name from match_results
)
  select * from rws
  pivot (
    count (*)
    for 
    in (

    )
  );

The team names are:

Underrated United
Average Athletic
Terrible Town
Champions City

And the output of this query should be:

'Underrated United'   'Average Athletic'   'Terrible Town'   'Champions City'   
                    1                    2                 1                  1

Module5
Implicit Group By
Any columns from the input table not in the pivot clause form an implicit group by. This can lead to the output having more rows than you expected.
For example, if you place pivot after the from clause, all the table's columns are inputs to it. So you get one row for every match
```
select * from match_results
pivot (
  count(*) for location in (
    'Snowley', 'Coldgate', 'Dorwall', 'Newdell' 
  )
);
```
To avoid this, use an inline view or CTE. This selects only the columns you'll use in the pivot or want in the final output
Module6
Expressions
Sometimes you may want to manipulate the values you're pivoting by. For example, you may want to see the number of games played each month, showing the months as columns.
So you need to convert the match dates to months. You can extract a month's abbreviation from a date using to_char with the format mask 'MON'.
But if you do this in your pivot clause, you'll get an error:
```
with rws as (
  select match_date from match_results
)
  select * from rws
  pivot (
    count (*) for to_char ( match_date, 'MON' ) in (
      'JAN', 'FEB', 'MAR'
    )
  );
```
To fix this, extract the month in an inline view or CTE. Alias the expression and use this alias in the pivot. For example:
```
with rws as (
  select to_char ( match_date, 'MON' ) match_month 
  from   match_results
)
  select * from rws
  pivot (
    count (*) for match_month in (
      'JAN', 'FEB', 'MAR'
    )
  );
```
Often you want to change values in the columns you're pivoting on. This and pivot's implicit grouping means it's a good idea to pivot the output of a CTE. This makes it easy to select and manipulate the columns you want.

Module7

Filtering Pivoted Rows

You can filter the output of pivot using a where clause. This goes after the pivot.

For example, say you want to show a table of the number of matches played each month in each location. You can do this with:

with rws as (
  select location, to_char ( match_date, 'MON' ) match_month 
  from   match_results
)
  select * from rws
  pivot (
    count (*) for match_month in (
      'JAN', 'FEB', 'MAR'
    )
  )

If you want to restrict this to those locations which had at least one match in January, add the where clause "'JAN'" > 0 at the end:

with rws as (
  select location, to_char ( match_date, 'MON' ) match_month 
  from   match_results
)
  select * from rws
  pivot (
    count (*) for match_month in (
      'JAN', 'FEB', 'MAR'
    )
  )
  where  "'JAN'" > 0

Module8

New Column Names

By default, the names of the new columns are the values from the IN list, wrapped in quotes. This can make your SQL fiddly. To reference the new columns, you must use double AND single quotes.

To simplify this, alias each value in the list:

with rws as (
  select location, to_char ( match_date, 'MON' ) match_month 
  from   match_results
)
  select * from rws
  pivot (
    count (*) for match_month in (
      'JAN' jan, 'FEB' feb, 'MAR' mar
    )
  );

This makes it easier to filter the values from the previous list:

with rws as (
  select location, to_char ( match_date, 'MON' ) match_month 
  from   match_results
)
  select * from rws
  pivot (
    count (*) for match_month in (
      'JAN' jan, 'FEB' feb, 'MAR' mar
    )
  )
  where jan > 0;

Module9

Try It!

Complete the following query to show:

For each location, the number of games played on each day of the week
The three letter abbreviation of each day as columns
The column headings without quotes
Those locations that had one or more games played on Monday

with rws as (
  select location,
         to_char ( match_date, 'DY' ) match_day
  from   match_results
)
  select * from rws
  pivot (
    count (*) for match_day in (
      'MON' , 'TUE' , 'WED' , 'THU' ,
      'FRI' , 'SAT' , 'SUN' 
    )
  )
  where  mon
  order  by location

to_char ( match_date, 'DY' ) returns the three-letter abbreviation of the day for the date.

The output of this query should be:

LOCATION   MON   TUE   WED   THU   FRI   SAT   SUN   
Coldgate       1     0     0     1     0     0     0 
Snowley        1     0     0     0     0     0     0

Module10
Pivoting Many Values
You can pivot many values. For example, say for each month you want columns showing:
- The number of matches played
- The total points scored by the home team
- The total points scored by the away team
You can do this by adding:
- count(*)
- sum ( home_team_points )
- sum ( away_team_points )
To the first part of the pivot. Note you must alias all except one of these. The generated column names are the aliases appended to the IN list headings:
```
with rws as (
  select location, to_char ( match_date, 'MON' ) match_month ,
         home_team_points, away_team_points
  from   match_results
)
  select * from rws
  pivot (
    count (*) matches, 
    sum ( home_team_points ) home_points,
    sum ( away_team_points ) away_points
    for match_month in (
      'JAN' jan, 'FEB' feb, 'MAR' mar
    )
  );
```
Each new aggregate will have a new column for every value in the IN list. So the total number of new columns is:
```
Number of aggregate columns * Number of IN list values
```

Module11

Try It!

Complete the following query to show for each location:

A column counting the number of games played
The total number of points scored at each location ( home_team_points + away_team_points )
The count columns should have the matches suffix. The total points scored the suffix points

with rws as (
  select location, home_team_points, away_team_points
  from   match_results
)
  select * from rws
  pivot (

    for location in (
      'Snowley' snowley, 'Coldgate' coldgate, 
      'Dorwall' dorwall, 'Newdell' newdell
    )
  );

The output of this query should be:

SNOWLEY_MATCHES   SNOWLEY_POINTS   COLDGATE_MATCHES   COLDGATE_POINTS   DORWALL_MATCHES   DORWALL_POINTS   NEWDELL_MATCHES   NEWDELL_POINTS   
                1                2                  2                11                 1                1                 1                8

Module12
Dynamic Pivoting
Often you'll want to change the columns pivot adds. For example, in many leagues the teams change each season. So every year you'll need to change queries that pivot by location. It'd be handy if these changed dynamically.
Sadly regular pivot doesn't allow you to do this. The values in the IN list are fixed. You can't use subqueries or variables to change them.
Luckily you can have a dynamic IN list using XML pivoting. This enables you to pass values from a subquery. Or generate totals for every value in the pivot column using ANY:
```
with rws as (
  select location
  from   match_results
)
  select xmlserialize ( -- formats an XML document
           document location_xml as clob indent size=2
         ) location_xml
  from   rws
  pivot xml (
    count (*) matches for location in ( any )
  );
```
But you get the pivoted values as a single XML document! So to extract the values, you need to manipulate the XML.
Module13
Unpivoting
Unpivoting is the process of taking columns and converting them to rows. For example, you may want to convert the home & away team names to a single team column.
You can do a DIY unpivot using union all. This will query the source table once for each column you want to become a row.
For example, to get a row for the home and away teams for each match, you need two queries like so:
```
select match_date, location, 'HOME' home_or_away, home_team_name team
from   match_results
union  all
select match_date, location, 'AWAY' home_or_away, away_team_name team
from   match_results
order  by match_date, location, home_or_away;
```
Module14
Unpivot Clause
Instead you could use the unpivot clause. This was also introduced in Oracle Database 11g and makes unpivoting easier.
1. A new column storing the pivoted values
2. Another new column stating the source column of these values
3. The columns to become rows
This gives the following statement:
```
select match_date, location, home_or_away, team 
from   match_results
unpivot (
  team for home_or_away in ( 
    home_team_name as 'HOME', away_team_name as 'AWAY'
  )
)
order  by match_date, location, home_or_away;
```

Module15

Try It!

Complete the following query to unpivot the home and away points for each match:

select match_date, location, home_or_away, points 
from   match_results
unpivot (
  
  home_or_away in ( 
    home_team_points as 'HOME', away_team_points as 'AWAY'
  )
)
order  by match_date, location, home_or_away;

This should give this output:

MATCH_DATE    LOCATION   HOME_OR_AWAY   POINTS   
01-JAN-2018   Coldgate   AWAY                  4 
01-JAN-2018   Coldgate   HOME                  1 
01-JAN-2018   Snowley    AWAY                  0 
01-JAN-2018   Snowley    HOME                  2 
01-FEB-2018   Dorwall    AWAY                  1 
01-FEB-2018   Dorwall    HOME                  0 
01-MAR-2018   Coldgate   AWAY                  3 
01-MAR-2018   Coldgate   HOME                  3 
02-MAR-2018   Newdell    AWAY                  0 
02-MAR-2018   Newdell    HOME                  8

Module16

Combining Pivot and Unpivot

You can combine pivot and unpivot a single statement. The output from the first becomes the input to the second.

For example, say you want to produce a league table from the match results. For each team this will show the number of games they've won, drawn, and lost.

To do this, you need several steps:

First, compare the home and away team points to find out whether each won, lost or drew.

Then, to count up the number of outcomes for each team, you need one column with all the names. To do this, you need to unpivot home and away, giving single column of team names.

Finally, to get the league table, you need to pivot the number of wins, losses, and draws for each team.

This gives a statement like:

with rws as (
  select home_team_name, away_team_name, 
         case
           when home_team_points > away_team_points then 'WON'
           when home_team_points < away_team_points then 'LOST'
           else 'DRAW'
         end home_team_result,
         case
           when home_team_points < away_team_points then 'WON'
           when home_team_points > away_team_points then 'LOST'
           else 'DRAW'
         end away_team_result
  from   match_results
)
  select team, w, d, l 
  from   rws
  unpivot (
    ( team, result ) for home_or_away in ( 
      ( home_team_name, home_team_result ) as 'HOME', 
      ( away_team_name, away_team_result ) as 'AWAY'
    )
  )
  pivot (
    count (*), min ( home_or_away ) dummy
    for result in (
      'WON' W, 'DRAW' D, 'LOST' L
    )
  )
  order  by w desc, d desc, l;

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Converting Rows to Columns and Back Again: Databases for Developers

Prerequisite SQL

Introduction

Manual Pivot

Pivot Clause

Try it!

Implicit Group By

Expressions

Filtering Pivoted Rows

New Column Names

Try It!

Pivoting Many Values

Try It!

Dynamic Pivoting

Unpivoting

Unpivot Clause

Try It!

Combining Pivot and Unpivot

评论