Friday, November 23, 2007

Unpivoting Data

Unpivoting data is a very useful technique for normalizing denormalized tables. It refers to the process of transposing multiple columns across into rows in a single column. In other words, taking the data below:

sales_year  first_quarter  second_quarter   third_quarter   fourth_quarter

---------- ------------- --------------   -------------   --------------

2006        211203.50      381594.95        439187.00       503155.80

2007        231205.10      451101.25        601209.40       531907.30



and converting to look like this:

sales_year   sales_quarter  sales_amount

----------- ------------- ------------

2006        1              211203.50

2006        2              381594.95

2006        3              439187.00

2006        4              503155.80

2007        1              231205.10

2007        2              451101.25

2007        3              601209.40

2007        4              531907.30



Here are different methods to achieve that:

-- Create sample table

CREATE TABLE QuarterlySales (

 sales_year INT PRIMARY KEY,

 first_quarter DECIMAL(10, 2),

 second_quarter DECIMAL(10, 2),

 third_quarter DECIMAL(10, 2),

 fourth_quarter DECIMAL(10, 2));

 

-- Insert data

INSERT INTO QuarterlySales VALUES(2006, 211203.50, 381594.95, 439187.00, 503155.80);

INSERT INTO QuarterlySales VALUES(2007, 231205.10, 451101.25, 601209.40, 531907.30);

 

-- Method 1) Using UNION

SELECT sales_year,

       1 AS sales_quarter,

       first_quarter AS sales_amount

FROM QuarterlySales

UNION ALL

SELECT sales_year, 2, second_quarter

FROM QuarterlySales

UNION ALL

SELECT sales_year, 3, third_quarter

FROM QuarterlySales

UNION ALL

SELECT sales_year, 4, fourth_quarter

FROM QuarterlySales

ORDER BY sales_year, sales_quarter;

 

-- Method 2) Using cross join with table with numbers

SELECT sales_year,

       qtr AS sales_quarter,

       CASE qtr

        WHEN 1 THEN first_quarter

        WHEN 2 THEN second_quarter

        WHEN 3 THEN third_quarter

        WHEN 4 THEN fourth_quarter

       END AS sales_amount

FROM QuarterlySales

CROSS JOIN

(SELECT 1 UNION ALL

 SELECT 2 UNION ALL

 SELECT 3 UNION ALL

 SELECT 4) AS Quarters(qtr)

ORDER BY sales_year, sales_quarter;

 

-- Method 3) Using the UNPIVOT operator in SQL Server 2005

SELECT sales_year,

       CASE sales_quarter

        WHEN 'first_quarter' THEN 1

        WHEN 'second_quarter' THEN 2

        WHEN 'third_quarter' THEN 3

        WHEN 'fourth_quarter' THEN 4

       END AS sales_quarter,

       sales_amount   

FROM QuarterlySales

UNPIVOT

(sales_amount FOR

 sales_quarter IN

 (first_quarter, second_quarter,

  third_quarter, fourth_quarter)) AS U

ORDER BY sales_year, sales_quarter;



It is good no note that unpivoting multiple columns (like amount1, date1, amount2, date2 to amount and date columns) is not supported by the SQL Server 2005 UNPIVOT operator. This can be achieved only by using the first two methods.

Additional Resources:

Using PIVOT and UNPIVOT
http://msdn2.microsoft.com/en-us/library/ms177410.aspx

8 comments:

Mohammad said...

Hi Plamen,
I solved the problem without tally table (number table).
I used GROUPING SET for copy some instance of data, then using ROW_NUMBER for sequence numbers then solving the unpivoting problem.

SELECT sales_year, rec_id AS sales_quarter,
CASE rec_id
WHEN 1 THEN first_quarter
WHEN 2 THEN second_quarter
WHEN 3 THEN third_quarter
WHEN 4 THEN fourth_quarter
END AS sales_amount
FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY sales_year ORDER BY (SELECT NULL)) AS rec_id
FROM QuarterlySales
GROUP BY GROUPING SETS
(
(sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter),
(sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter),
(sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter),
(sales_year, first_quarter, second_quarter, third_quarter, fourth_quarter)
)
) D

Plamen Ratchev said...

This is interesting approach Mohammad. To my opinion if you are already using SQL Server 2005/2008 features it is much shorter syntax and clear to understand by using the UNPIVOT operator.

Mohammad said...

Hi Plamen,
I am back with a much clear version!
What are you thinking about this, Do you think this version is so clear and simple?

SELECT sales_year,
RIGHT(COALESCE(CAST(first_quarter AS VARCHAR)+ '1',
CAST(second_quarter AS VARCHAR) + '2',
CAST(third_quarter AS VARCHAR) + '3',
CAST(fourth_quarter AS VARCHAR) + '4'), 1) AS sales_quarter,
COALESCE(first_quarter, second_quarter, third_quarter, fourth_quarter) AS sales_amount
FROM QuarterlySales
GROUP BY GROUPING SETS
(
(sales_year, first_quarter),
(sales_year, second_quarter),
(sales_year, third_quarter),
(sales_year, fourth_quarter)
)
ORDER BY sales_year, sales_quarter;

The result sets
/*
sales_year sales_quarter sales_amount
----------- ------------- -------------
2006 ------ 1 ----------- 211203.50
2006 ------ 2 ----------- 381594.95
2006 ------ 3 ----------- 439187.00
2006 ------ 4 ----------- 503155.80
2007 ------ 1 ----------- 231205.10
2007 ------ 2 ----------- 451101.25
2007 ------ 3 ----------- 601209.40
2007 ------ 4 ----------- 531907.30
*/

Plamen Ratchev said...

Yes, this version is simplified, great use of grouping sets!

Mohammad said...

Thank you!

Mohammad said...

Hi Plamen,
If you use 2005/2008 version it seems the best is to avoiding UNPIVOT table operator, and using this method:
SELECT sales_year,
sales_quarter,
sales_amount
FROM QuarterlySales
CROSS APPLY
(SELECT 1, first_quarter
UNION ALL
SELECT 2, second_quarter
UNION ALL
SELECT 3, third_quarter
UNION ALL
SELECT 4, fourth_quarter) D(sales_quarter, sales_amount);

Vladimir said...

Wouldn't running the query with many CAST operations over a large set affect its performance quite a bit? RE:
SELECT sales_year,
RIGHT(COALESCE(CAST(first_quarter AS VARCHAR)+ '1',
CAST(second_quarter AS VARCHAR) + '2',
CAST(third_quarter AS VARCHAR) + '3',
CAST(fourth_quarter AS VARCHAR) + '4'), 1) AS sales_quarter,
COALESCE(first_quarter, second_quarter, third_quarter, fourth_quarter) AS sales_amount
FROM QuarterlySales
GROUP BY GROUPING SETS
(
(sales_year, first_quarter),
(sales_year, second_quarter),
(sales_year, third_quarter),
(sales_year, fourth_quarter)
)
ORDER BY sales_year, sales_quarter;

Plamen Ratchev said...

Hi Vladimir,

Yes, the CAST function will have impact on performance on a very large set. Difficult to say how much without testing with real data.