Wednesday, March 12, 2008

Updates with CTE

Performing updates on columns based on values from another table is a very common need. Using the ANSI UPDATE normally requires multiple subqueries, which can be very inefficient especially if multiple filters have to be applied. The Microsoft specific UPDATE with JOIN is one solution. However, common table expressions provide a very elegant alternative, which has the same efficient plan as UPDATE with JOIN, but is much easier to read and maintain. The sample below demonstrates how to perform update based on another table using join and CTE. Using MERGE in SQL Server 2008 will make it even better.

CREATE TABLE Products (

 sku CHAR(5) NOT NULL PRIMARY KEY,

 product_desc VARCHAR(35) NOT NULL,

 price DECIMAL(12, 2) NOT NULL DEFAULT 0);

 

CREATE TABLE ProductUpdates (

 sku CHAR(5) NOT NULL PRIMARY KEY,

 product_desc VARCHAR(35) NOT NULL,

 price DECIMAL(12, 2) NOT NULL DEFAULT 0,

 effective_date DATETIME NOT NULL);

 

INSERT INTO Products VALUES ('CHS01', 'Child seat', 25.50);

INSERT INTO Products VALUES ('CUP03', 'Water cup', 5.25);

INSERT INTO Products VALUES ('HOL01', 'Cup holder', 3.50);

 

INSERT INTO ProductUpdates VALUES ('CHS01', 'Child seat with cushion', 26.95, '20080301');

INSERT INTO ProductUpdates VALUES ('CUP03', 'Water cup with handle', 6.25, '20080405');

 

-- Update all current product descriptions and prices

-- with updates that have effective date past today (March 12, 2008)

WITH Updates

AS

(SELECT P.product_desc,

        P.price,

        U.product_desc AS new_product_desc,

        U.price AS new_price

 FROM Products AS P

 JOIN ProductUpdates AS U

   ON P.sku = U.sku

 WHERE U.effective_date < CURRENT_TIMESTAMP)

UPDATE Updates

SET product_desc = new_product_desc,

    price = new_price;

 

SELECT sku, product_desc, price

FROM Products;

 

-- Results

sku   product_desc            price

----- ----------------------- -----

CHS01 Child seat with cushion 26.95

CUP03 Water cup                5.25

HOL01 Cup holder               3.50       

7 comments:

Ram said...

hi,

your algorithms are really cool. can you think of a simple way to solve this problem (i'm using aliases since i can't give the business context)-

i have a table (boyname, boyheight) and another table (girlname, girlheight). the mandate is that i need to start matching from the shortest girl to the tallest girl, and for each girl taken in this order, the match is the shortest available boy who is taller than her.

the simplest algo would be -
1. girls {left join} boys with constraint as girl shorter than boy
2. take [girl, height] into cursor in ascending order of height
3. for each cursor, get the boy with min height from the joined table - delete this boy from all other girls and delete the cursor girl from all other boys
4. go to next cursor and repeat

i've done this using cursors, while loops and even semi-batched whiles, but i'm not convinced that's the best that can be done. surely there must be *some* way of doing this without using loops? i tried fitting this around recursive CTE's, but I'm obviously not as used to them yet.

Plamen Ratchev said...

Hi Ram,
This is a problem that is currently best solved using a cursor. You could use a recursive CTE but performance will not be good on a large result set. Here is example of how it can be solved using a recursive CTE.

WITH Ranked
AS (
SELECT boyname, boyheight, girlname, girlheight,
DENSE_RANK() OVER(ORDER BY girlheight, girlname) AS girl_rk,
DENSE_RANK() OVER(ORDER BY boyheight, boyname) AS boy_rk,
ROW_NUMBER() OVER(PARTITION BY girlname ORDER BY boyheight,
boyname) AS boy_girl_rk
FROM Girls AS G
LEFT JOIN Boys AS B
ON G.girlheight < B.boyheight),
Match
AS (
SELECT boyname, boyheight,
girlname, girlheight,
girl_rk, boy_rk, boy_girl_rk
FROM Ranked
WHERE girl_rk = 1
AND boy_girl_rk = 1
UNION ALL
SELECT R.boyname, R.boyheight,
R.girlname, R.girlheight,
R.girl_rk, R.boy_rk, R.boy_girl_rk
FROM Ranked AS R
JOIN Match AS M
ON R.girl_rk = M.girl_rk + 1
AND (R.boyname IS NULL
OR (R.boy_rk > M.boy_rk
AND R.boy_girl_rk =
(SELECT T.boy_girl_rk
FROM (SELECT R2.boy_girl_rk,
ROW_NUMBER() OVER(ORDER BY R2.boy_girl_rk) AS rk
FROM Ranked AS R2
WHERE R2.girl_rk = R.girl_rk
AND R2.boy_rk > M.boy_rk) AS T
WHERE rk = 1))))
SELECT girlname, girlheight, boyname, boyheight
FROM Match
OPTION (MAXRECURSION 0);

Ram said...

Hi Plamen,

That was totally cool - took me some time to wrap my head around the logic.

I removed the NULL check since I don't need unmatched entries, and that brought down exec time for a 2K-row test run from 00:02:25 to 00:00:18 --

WITH Ranked
AS (
SELECT boyname, boyheight, girlname, girlheight,
DENSE_RANK() OVER(ORDER BY girlheight, girlname) AS girl_rk,
DENSE_RANK() OVER(ORDER BY boyheight, boyname) AS boy_rk,
ROW_NUMBER() OVER(PARTITION BY girlname ORDER BY boyheight, boyname) AS boy_girl_rk
FROM Girls AS G
JOIN Boys AS B
ON G.girlheight < B.boyheight),
Match
AS (
SELECT boyname, boyheight, girlname, girlheight, girl_rk, boy_rk, boy_girl_rk
FROM Ranked
WHERE girl_rk = 1
AND boy_girl_rk = 1
UNION ALL
SELECT R.boyname, R.boyheight, R.girlname, R.girlheight, R.girl_rk, R.boy_rk, R.boy_girl_rk
FROM Ranked AS R
JOIN Match AS M
ON R.girl_rk = M.girl_rk + 1
AND R.boy_rk > M.boy_rk
AND R.boy_girl_rk =
(SELECT T.boy_girl_rk
FROM
(
SELECT R2.boy_girl_rk,
ROW_NUMBER() OVER(ORDER BY R2.boy_girl_rk) AS rk
FROM Ranked AS R2
WHERE R2.girl_rk = R.girl_rk
AND R2.boy_rk > M.boy_rk
) AS T
WHERE rk = 1
)
)
SELECT girlname, girlheight, boyname, boyheight
FROM Match
OPTION (MAXRECURSION 0);

I also tried rewriting with APPLY/ another CTE, and they each yielded 00:00:09 --

APPLY ->

WITH Ranked
AS (
SELECT boyname, boyheight, girlname, girlheight,
DENSE_RANK() OVER(ORDER BY girlheight, girlname) AS girl_rk,
DENSE_RANK() OVER(ORDER BY boyheight, boyname) AS boy_rk,
ROW_NUMBER() OVER(PARTITION BY girlname ORDER BY boyheight, boyname) AS boy_girl_rk
FROM Girls AS G
JOIN Boys AS B
ON G.girlheight < B.boyheight),
Match
AS (
SELECT boyname, boyheight, girlname, girlheight, girl_rk, boy_rk, boy_girl_rk
FROM Ranked
WHERE girl_rk = 1
AND boy_girl_rk = 1
UNION ALL
SELECT R.boyname, R.boyheight, R.girlname, R.girlheight, R.girl_rk, R.boy_rk, R.boy_girl_rk
FROM Ranked AS R
JOIN Match AS M
ON R.girl_rk = M.girl_rk + 1
AND R.boy_rk > M.boy_rk
CROSS APPLY
(
SELECT R2.boy_girl_rk,
ROW_NUMBER() OVER(ORDER BY R2.boy_girl_rk) AS rk
FROM Ranked AS R2
WHERE R2.girl_rk = R.girl_rk
AND R2.boy_rk > M.boy_rk
) AS T
where T.boy_girl_rk = R.boy_girl_rk
and T.rk = 1
)
SELECT girlname, girlheight, boyname, boyheight
FROM Match
OPTION (MAXRECURSION 0);

Another CTE ->

WITH Ranked
AS (
SELECT boyname, boyheight, girlname, girlheight,
DENSE_RANK() OVER(ORDER BY girlheight, girlname) AS girl_rk,
DENSE_RANK() OVER(ORDER BY boyheight, boyname) AS boy_rk,
ROW_NUMBER() OVER(PARTITION BY girlname ORDER BY boyheight, boyname) AS boy_girl_rk
FROM Girls AS G
JOIN Boys AS B
ON G.girlheight < B.boyheight),
MinRanked
AS (
SELECT MIN(R4.boy_girl_rk) as boy_girl_rk, R4.girl_rk, R5.boy_rk
FROM Ranked R4
JOIN Ranked R5
ON R4.girl_rk = R5.girl_rk + 1
AND R4.boy_rk > R5.boy_rk
GROUP BY R4.girl_rk, R5.boy_rk
),
Match
AS (
SELECT boyname, boyheight, girlname, girlheight, girl_rk, boy_rk, boy_girl_rk
FROM Ranked
WHERE girl_rk = 1
AND boy_girl_rk = 1
UNION ALL
SELECT R.boyname, R.boyheight, R.girlname, R.girlheight, R.girl_rk, R.boy_rk, R.boy_girl_rk
FROM Ranked AS R
JOIN Match AS M
ON R.girl_rk = M.girl_rk + 1
AND R.boy_rk > M.boy_rk
JOIN MinRanked AS R3
ON R3.girl_rk = R.girl_rk
AND R3.boy_rk = M.boy_rk
AND R3.boy_girl_rk = R.boy_girl_rk
)
SELECT girlname, girlheight, boyname, boyheight
FROM Match
OPTION (MAXRECURSION 0);

You're right though- cursors are quicker still and I'll probably stick to them for the stored procedure I was working on, but this logic was wonderful to read nevertheless. Thanks!

Dardan said...

Hello!

Maybe I'm posting too late.
Lately i've come across your blog and I like your posts allot!

Think I have another solution for Girls/Boys. What do you think?

WITH Matches AS
(
SELECT g.[Girl_Name]
,g.[Girl_Height]
,b.[Boy_Name]
,b.[Boy_Height]
,MIN([Boy_Height]) OVER(PARTITION BY g.[Girl_Name]) [Min_Boy_Height]
FROM Girls g
LEFT JOIN Boys b
ON b.[Boy_Height] > g.[Girl_Height]
)
SELECT m.Girl_Name
,m.Girl_Height
,m.Boy_Name
,m.Boy_Height
FROM Matches m
WHERE m.[Boy_Height] = m.[Min_Boy_Height]

Plamen Ratchev said...

Hi Dardan,

Your solution will not produce the desired results because it returns multiple matches and does not eliminate boys that are already matched.

Dardan said...

What about this one...

WITH Matches AS
(
SELECT g.[Girl_Name]
,g.[Girl_Height]
,b.[Boy_Name]
,b.[Boy_Height]
,DENSE_RANK() OVER(ORDER BY g.[Girl_Height] ,g.[Girl_Name]) gdr
,DENSE_RANK() OVER(ORDER BY b.[Boy_Height] ,b.[Boy_Name]) bdr
FROM Girls g
LEFT JOIN Boys b
ON b.[Boy_Height] > g.[Girl_Height]
)
SELECT [Girl_Name]
,[Girl_Height]
,[Boy_Name]
,[Boy_Height]
FROM Matches
WHERE gdr = bdr

Plamen Ratchev said...

Hi Dardan,

This solution does not return all matches and the return matches are incorrect. Here is sample data to try it:

CREATE TABLE Boys (
boyname VARCHAR(35) NOT NULL PRIMARY KEY,
boyheight DECIMAL(2, 1));

CREATE TABLE Girls (
girlname VARCHAR(35) NOT NULL PRIMARY KEY,
girlheight DECIMAL(2, 1));

INSERT INTO Boys (boyname, boyheight) VALUES('Joe', 5.5);
INSERT INTO Boys (boyname, boyheight) VALUES('Jim', 5.2);
INSERT INTO Boys (boyname, boyheight) VALUES('Jeff', 6.1);
INSERT INTO Boys (boyname, boyheight) VALUES('John', 5.8);
INSERT INTO Boys (boyname, boyheight) VALUES('Greg', 5.4);
INSERT INTO Boys (boyname, boyheight) VALUES('Tito', 5.4);
INSERT INTO Boys (boyname, boyheight) VALUES('Gary', 5.5);
INSERT INTO Boys (boyname, boyheight) VALUES('Koko', 5.7);
INSERT INTO Boys (boyname, boyheight) VALUES('Rando', 6.0);
INSERT INTO Boys (boyname, boyheight) VALUES('Gordo', 6.4);
INSERT INTO Girls (girlname, girlheight) VALUES('Jane', 5.3);
INSERT INTO Girls (girlname, girlheight) VALUES('Jody', 5.4);
INSERT INTO Girls (girlname, girlheight) VALUES('Mary', 5.4);
INSERT INTO Girls (girlname, girlheight) VALUES('Jill', 5.4);
INSERT INTO Girls (girlname, girlheight) VALUES('Julie', 5.6);
INSERT INTO Girls (girlname, girlheight) VALUES('Sarah', 6.6);
INSERT INTO Girls (girlname, girlheight) VALUES('Jina', 5.6);
INSERT INTO Girls (girlname, girlheight) VALUES('Rite', 5.9);
INSERT INTO Girls (girlname, girlheight) VALUES('Lola', 6.3);

With this sample data the correct matches are:

girlname girlheight boyname boyheight
--------- ----------- -------- ---------
Jane 5.3 Greg 5.4
Jill 5.4 Gary 5.5
Jody 5.4 Joe 5.5
Mary 5.4 Koko 5.7
Jina 5.6 John 5.8
Julie 5.6 Rando 6.0
Rite 5.9 Jeff 6.1
Lola 6.3 Gordo 6.4
Sarah 6.6 NULL NULL