Friday, November 23, 2007

Unpivoting Data

Unpivoting data is a very useful technique for normalizing denormalized tables. It refers to the process of transposing multiple columns across into rows in a single column. In other words, taking the data below:

sales_year  first_quarter  second_quarter   third_quarter   fourth_quarter

---------- ------------- --------------   -------------   --------------

2006        211203.50      381594.95        439187.00       503155.80

2007        231205.10      451101.25        601209.40       531907.30



and converting to look like this:

sales_year   sales_quarter  sales_amount

----------- ------------- ------------

2006        1              211203.50

2006        2              381594.95

2006        3              439187.00

2006        4              503155.80

2007        1              231205.10

2007        2              451101.25

2007        3              601209.40

2007        4              531907.30



Here are different methods to achieve that:

-- Create sample table

CREATE TABLE QuarterlySales (

 sales_year INT PRIMARY KEY,

 first_quarter DECIMAL(10, 2),

 second_quarter DECIMAL(10, 2),

 third_quarter DECIMAL(10, 2),

 fourth_quarter DECIMAL(10, 2));

 

-- Insert data

INSERT INTO QuarterlySales VALUES(2006, 211203.50, 381594.95, 439187.00, 503155.80);

INSERT INTO QuarterlySales VALUES(2007, 231205.10, 451101.25, 601209.40, 531907.30);

 

-- Method 1) Using UNION

SELECT sales_year,

       1 AS sales_quarter,

       first_quarter AS sales_amount

FROM QuarterlySales

UNION ALL

SELECT sales_year, 2, second_quarter

FROM QuarterlySales

UNION ALL

SELECT sales_year, 3, third_quarter

FROM QuarterlySales

UNION ALL

SELECT sales_year, 4, fourth_quarter

FROM QuarterlySales

ORDER BY sales_year, sales_quarter;

 

-- Method 2) Using cross join with table with numbers

SELECT sales_year,

       qtr AS sales_quarter,

       CASE qtr

        WHEN 1 THEN first_quarter

        WHEN 2 THEN second_quarter

        WHEN 3 THEN third_quarter

        WHEN 4 THEN fourth_quarter

       END AS sales_amount

FROM QuarterlySales

CROSS JOIN

(SELECT 1 UNION ALL

 SELECT 2 UNION ALL

 SELECT 3 UNION ALL

 SELECT 4) AS Quarters(qtr)

ORDER BY sales_year, sales_quarter;

 

-- Method 3) Using the UNPIVOT operator in SQL Server 2005

SELECT sales_year,

       CASE sales_quarter

        WHEN 'first_quarter' THEN 1

        WHEN 'second_quarter' THEN 2

        WHEN 'third_quarter' THEN 3

        WHEN 'fourth_quarter' THEN 4

       END AS sales_quarter,

       sales_amount   

FROM QuarterlySales

UNPIVOT

(sales_amount FOR

 sales_quarter IN

 (first_quarter, second_quarter,

  third_quarter, fourth_quarter)) AS U

ORDER BY sales_year, sales_quarter;



It is good no note that unpivoting multiple columns (like amount1, date1, amount2, date2 to amount and date columns) is not supported by the SQL Server 2005 UNPIVOT operator. This can be achieved only by using the first two methods.

Additional Resources:

Using PIVOT and UNPIVOT
http://msdn2.microsoft.com/en-us/library/ms177410.aspx

7 comments:

  1. Hi Plamen,
    I solved the problem without tally table (number table).
    I used GROUPING SET for copy some instance of data, then using ROW_NUMBER for sequence numbers then solving the unpivoting problem.

    SELECT sales_year, rec_id AS sales_quarter,
    CASE rec_id
    WHEN 1 THEN first_quarter
    WHEN 2 THEN second_quarter
    WHEN 3 THEN third_quarter
    WHEN 4 THEN fourth_quarter
    END AS sales_amount
    FROM
    (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY sales_year ORDER BY (SELECT NULL)) AS rec_id
    FROM QuarterlySales
    GROUP BY GROUPING SETS
    (
    (sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter),
    (sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter),
    (sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter),
    (sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter)
    )
    ) D

    ReplyDelete
  2. This is interesting approach Mohammad. To my opinion if you are already using SQL Server 2005/2008 features it is much shorter syntax and clear to understand by using the UNPIVOT operator.

    ReplyDelete
  3. Hi Plamen,
    I am back with a much clear version!
    What are you thinking about this, Do you think this version is so clear and simple?

    SELECT sales_year,
    RIGHT(COALESCE(CAST(first_quarter AS VARCHAR)+ '1',
    CAST(second_quarter AS VARCHAR) + '2',
    CAST(third_quarter AS VARCHAR) + '3',
    CAST(fourth_quarter AS VARCHAR) + '4'), 1) AS sales_quarter,
    COALESCE(first_quarter, second_quarter, third_quarter, fourth_quarter) AS sales_amount
    FROM QuarterlySales
    GROUP BY GROUPING SETS
    (
    (sales_year, first_quarter),
    (sales_year, second_quarter),
    (sales_year, third_quarter),
    (sales_year, fourth_quarter)
    )
    ORDER BY sales_year, sales_quarter;

    The result sets
    /*
    sales_year sales_quarter sales_amount
    ----------- ------------- -------------
    2006 ------ 1 ----------- 211203.50
    2006 ------ 2 ----------- 381594.95
    2006 ------ 3 ----------- 439187.00
    2006 ------ 4 ----------- 503155.80
    2007 ------ 1 ----------- 231205.10
    2007 ------ 2 ----------- 451101.25
    2007 ------ 3 ----------- 601209.40
    2007 ------ 4 ----------- 531907.30
    */

    ReplyDelete
  4. Yes, this version is simplified, great use of grouping sets!

    ReplyDelete
  5. Hi Plamen,
    If you use 2005/2008 version it seems the best is to avoiding UNPIVOT table operator, and using this method:
    SELECT sales_year,
    sales_quarter,
    sales_amount
    FROM QuarterlySales
    CROSS APPLY
    (SELECT 1, first_quarter
    UNION ALL
    SELECT 2, second_quarter
    UNION ALL
    SELECT 3, third_quarter
    UNION ALL
    SELECT 4, fourth_quarter) D(sales_quarter, sales_amount);

    ReplyDelete
  6. Wouldn't running the query with many CAST operations over a large set affect its performance quite a bit? RE:
    SELECT sales_year,
    RIGHT(COALESCE(CAST(first_quarter AS VARCHAR)+ '1',
    CAST(second_quarter AS VARCHAR) + '2',
    CAST(third_quarter AS VARCHAR) + '3',
    CAST(fourth_quarter AS VARCHAR) + '4'), 1) AS sales_quarter,
    COALESCE(first_quarter, second_quarter, third_quarter, fourth_quarter) AS sales_amount
    FROM QuarterlySales
    GROUP BY GROUPING SETS
    (
    (sales_year, first_quarter),
    (sales_year, second_quarter),
    (sales_year, third_quarter),
    (sales_year, fourth_quarter)
    )
    ORDER BY sales_year, sales_quarter;

    ReplyDelete
  7. Hi Vladimir,

    Yes, the CAST function will have impact on performance on a very large set. Difficult to say how much without testing with real data.

    ReplyDelete