Column Aggregates

We all know how to use the aggregate functions MIN, MAX, COUNT, etc. to calculate aggregates across rows. For example, using MAX we can determine the maximum value for group of rows. But what if we want to see the maximum value for each row across columns? There is no aggregate function that accepts list of columns…

Let's look at example to illustrate this. Here is sample table with four data columns:

CREATE TABLE Foo (

 keycol INT NOT NULL PRIMARY KEY,

 col1 INT NOT NULL,

 col2 INT NOT NULL,

 col3 INT NOT NULL,

 col4 INT NOT NULL);

 

INSERT INTO Foo VALUES(1, 5, 0, 1, 10); 

INSERT INTO Foo VALUES(2, 0, 0, 3, 1);

INSERT INTO Foo VALUES(3, 0, 0, 0, 0);

INSERT INTO Foo VALUES(4, 9, 1, 22, 8);

INSERT INTO Foo VALUES(5, 8, 8, 8, 8);

Our goal is to calculate the maximum value for the four data columns for each row. With two or three columns it may be an easy task using a CASE expression. But adding more columns will make a very long and complex CASE expression. Is there an easier way?

One solution that provides a shortcut is utilizing the SQL Server capabilities to generate an XML result and then using XQuery to find the max value. Here is the query:

-- max across columns with XQuery

SELECT c.query('keycol').value('.', 'INT') AS keycol,

       c.value('local-name(./*[data(.)=

                max(../*[not(local-name(.)="keycol") ])][1])', 

                'VARCHAR(20)') AS max_col,

       c.value('max(*[not(local-name(.)="keycol")])', 'FLOAT') AS max_val

FROM (SELECT keycol, col1, col2, col3, col4

      FROM Foo

      FOR XML PATH, TYPE) AS T(x)

CROSS APPLY x.nodes('row') AS R(c);

The tricky part here is the filtering of the key column from the evaluation for max. Here are the results:

keycol  max_col  max_val

------- -------- --------

1       col4     10

2       col3     3

3       col1     0

4       col3     22

5       col1     8

How about counting the number of none zero values across columns? Here is a solution for that using very similar query:

-- count non-zero columns with XQuery

SELECT c.query('keycol').value('.', 'INT') AS keycol,

       c.value('count(*[not(local-name(.)="keycol") 

                and not(.=0)])', 

               'INT')  AS cnt_non_zero

FROM (SELECT keycol, col1, col2, col3, col4

      FROM Foo

      FOR XML PATH, TYPE) AS T(x)

CROSS APPLY x.nodes('row') AS R(c);

And the results:

keycol  cnt_non_zero

------- ------------

1       3

2       2

3       0

4       4

5       4

This shows the power of XQuery to solve this problem in untraditional way. The catch? Look at the execution plan of the queries...

More efficient method to solve the first problem is to use unpivoting. Here is solution using the UNPIVOT operator (the same can be done with ANSI query using cross join with numbers table):

-- max across columns with UNPIVOT

SELECT keycol, col AS max_col, val AS max_val

FROM (

SELECT keycol, val, col, 

       ROW_NUMBER() OVER(PARTITION BY keycol ORDER BY val DESC, col) AS rk

FROM Foo

UNPIVOT

(val FOR col IN (col1, col2, col3, col4)) AS U) AS T

WHERE rk = 1;

And very similar solution using unpivoting to solve the second problem:

-- count non-zero columns with UNPIVOT

SELECT keycol, COUNT(NULLIF(val, 0)) AS cnt_non_zero

FROM Foo

UNPIVOT

(val FOR col IN (col1, col2, col3, col4)) AS U

GROUP BY keycol;

Labels: t-sql programming, unpivot, xml