Saturday, January 31, 2009

Pivoting on Multiple Columns

Pivoting data on more than one column is not a common request. But there are times when it is a very useful technique for reporting purposes. Here is one example to demonstrate different methods to handle that.

The goal is to report product data by pivoting columns with value and quantity for each product based on year. Here is the script to create the table and insert sample data.

CREATE TABLE Products (

product VARCHAR(30),

market_year INT,

value INT,

quantity INT,

CONSTRAINT pk_products

PRIMARY KEY (product, market_year));
INSERT INTO Products VALUES('Corn', 2003, 100, 20);
INSERT INTO Products VALUES('Corn', 2004, 200, 25);

INSERT INTO Products VALUES('Corn', 2005, 150, 30);

INSERT INTO Products VALUES('Corn', 2006, 150, 10);

The request is to produce the following output:
product v2003 q2003 v2004 q2004 v2005 q2005 v2006 q2006

------- ----- ----- ----- ----- ----- ----- ----- -----

Corn 100 20 200 25 150 30 150 10

As usual, the first suspect for pivoting solution is the CASE function. Using CASE expressions is the most flexible method to manipulate data for pivoting. There is not much difference when pivoting a single or multiple columns. Here is the solution with CASE:

SELECT product,

SUM(CASE WHEN market_year = 2003 THEN value ELSE 0 END) AS v2003,

SUM(CASE WHEN market_year = 2003 THEN quantity ELSE 0 END) AS q2003,

SUM(CASE WHEN market_year = 2004 THEN value ELSE 0 END) AS v2004,

SUM(CASE WHEN market_year = 2004 THEN quantity ELSE 0 END) AS q2004,

SUM(CASE WHEN market_year = 2005 THEN value ELSE 0 END) AS v2005,

SUM(CASE WHEN market_year = 2005 THEN quantity ELSE 0 END) AS q2005,

SUM(CASE WHEN market_year = 2006 THEN value ELSE 0 END) AS v2006,

SUM(CASE WHEN market_year = 2006 THEN quantity ELSE 0 END) AS q2006

FROM Products

GROUP BY product;

Next, let's look at the PIVOT operator that was added in SQL Server 2005. The PIVOT operator has a few limitations, and one of them is that it supports pivoting only on a single column. However, T-SQL allows to specify multiple PIVOT operators in the FROM clause, that way providing a solution for pivoting on multiple columns. There is one catch: the second PIVOT consumes the temporary result set output of the first PIVOT operator. Because of that the year column is not available (since it is used as pivoting column in the first PIVOT) and there is a need to define a virtual column that replicates the values for the year. Also, since the first PIVOT operator uses the year values as column names, the new virtual column subtracts 2000 from the year value to generate different column names for the second PIVOT operator. Here is the query using two PIVOT operators:

SELECT product,

MAX([2003]) AS v2003,

MAX([3]) AS q2003,

MAX([2004]) AS v2004,

MAX([4]) AS q2004,

MAX([2005]) AS v2005,

MAX([5]) AS q2005,

MAX([2006]) AS v2006,

MAX([6]) AS q2006

FROM (SELECT product, market_year,

market_year - 2000 AS market_year2,

SUM(value) AS value,

SUM(quantity) AS quantity

FROM Products

GROUP BY product, market_year) AS T

PIVOT

(SUM(value) FOR market_year IN

([2003], [2004], [2005], [2006])) AS P1

PIVOT

(SUM(quantity) FOR market_year2 IN

([3], [4], [5], [6])) AS P2

GROUP BY product;

In this particular case there is more elegant solution using a single PIVOT operator. With little math the value and the quantity can be combined into single column and then after the pivoting split back to two separate columns. Here is the solution using a single PIVOT operator:

SELECT product,

[2003] / 1000 AS v2003,

[2003] % 1000 AS q2003,

[2004] / 1000 AS v2004,

[2004] % 1000 AS q2004,

[2005] / 1000 AS v2005,

[2005] % 1000 AS q2005,

[2006] / 1000 AS v2006,

[2006] % 1000 AS q2006

FROM (SELECT product, market_year,

value * 1000 + quantity AS value

FROM Products) AS T

PIVOT

(SUM(value) FOR market_year IN

([2003], [2004], [2005], [2006])) AS P;

13 comments:

nessik said...

Thanks!
Good, but not dynamic :)

Plamen Ratchev said...

Hi nessik,

You can vote here https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127071 to implement dynamic pivoting in SQL Server. Until then, this may work: http://pratchev.blogspot.com/2008/12/dynamic-pivoting.html

samdc said...

Thank you for taking time to write this. It saved me tons effort and time!

Brad Schulz said...

Hi Plamen...

Here's another approach (sorry about the formatting):

select
  P1.product,v2003,q2003,v2004,q2004,v2005,q2005,v2006,q2006
from
  (select product, pivotkey='v'+convert(char,market_year), value
   from Products) I
   pivot (sum(value) for pivotkey in ([v2003],[v2004],[v2005],[v2006])) P1
full join
  (select product, pivotkey='q'+convert(char,market_year), quantity
   from Products) I
   pivot (sum(quantity) for pivotkey in ([q2003],[q2004],[q2005],[q2006])) P2
on P1.product=P2.product;

Brad Schulz said...

Follow-up to my last comment...

No need for a FULL JOIN. I don't know how that got in there. A regular INNER JOIN would do.

Plamen Ratchev said...

Hi Brad,
Yes, this method works and I have used that in the past. But in essence my goal was to accomplish the task without multiple subqueries, which drags down performance. I am convinced the real solution is to enhance the PIVOT operator to allow pivoting on multiple columns. Until then, using CASE expressions is my choice.

Brad Schulz said...

Yet another approach just popped into my head (though its cost is about twice the previous approach I posted):

select *
from
  (select product, pivotkey='v'+convert(char,market_year), pivotamt=value
   from Products
   union all
   select product, pivotkey='q'+convert(char,market_year), pivotamt=quantity
   from Products) I
pivot
  (sum(pivotamt) for pivotkey in (v2003,q2003,v2004,q2004,v2005,q2005,v2006,q2006)) P

anurenj said...

thankyou thankou..so .soo.sooo much.....its an excellent piece of logic that i ever seen in my coding life...

anurenj said...

thankyou soooooooooooo much.its the excellent piece of code that i ever seen in my coding life....great great great

Paddy said...

Genious! You're a lifesaver!

Matt Poland said...

I'm not a SQL expert by any means so feel free to tear this query apart but I've found that this works cleanly well when you have a lot of pivots to pull off. The idea here is to use a CTE for each pivot and simply join them together afterwards.

;
WITH SourceProduct AS
(
SELECT DISTINCT product AS ProductName
FROM Products
),
ValueTotals AS
(
SELECT
product,
COALESCE(SUM(PivotData.[2003]), 0) AS Year2003Value,
COALESCE(SUM(PivotData.[2004]), 0) AS Year2004Value,
COALESCE(SUM(PivotData.[2005]), 0) AS Year2005Value,
COALESCE(SUM(PivotData.[2006]), 0) AS Year2006Value
FROM Products
INNER JOIN SourceProduct ON Products.product = SourceProduct.ProductName
PIVOT
(
SUM(value)
FOR market_year IN ([2003], [2004], [2005], [2006])
) AS PivotData
GROUP BY product

),
QuantityTotals AS
(
SELECT
product,
COALESCE(SUM(PivotData.[2003]), 0) AS Year2003Quantity,
COALESCE(SUM(PivotData.[2004]), 0) AS Year2004Quantity,
COALESCE(SUM(PivotData.[2005]), 0) AS Year2005Quantity,
COALESCE(SUM(PivotData.[2006]), 0) AS Year2006Quantity
FROM Products
INNER JOIN SourceProduct ON Products.product = SourceProduct.ProductName
PIVOT
(
SUM(quantity)
FOR market_year IN ([2003], [2004], [2005], [2006])
) AS PivotData
GROUP BY product

)
SELECT
ProductName,
Year2003Value,
Year2004Value,
Year2005Value,
Year2006Value,
Year2003Quantity,
Year2004Quantity,
Year2005Quantity,
Year2006Quantity
FROM SourceProduct
INNER JOIN ValueTotals ON SourceProduct.ProductName = ValueTotals.product
INNER JOIN QuantityTotals ON SourceProduct.ProductName = QuantityTotals.product

Plamen Ratchev said...

Hi Matt,

This is the same approach that Brad posted earlier in his comment. Yes, it works, but it is very inefficient, especially if you have to pivot on many columns.

Matt Poland said...

I just came to that same realization and your point is enforced from my perspective. The ability to pivot on multiple columns in a single query is definitely a desired feature and your SUM(CASE) approach seems the preferable method.