Tuesday, March 25, 2008

UPSERT and More with MERGE

One very common requirement is to merge source data to a target table. For example, merge the daily changes to accounts data from branch office table to central accounts table. On SQL Server version 2005 and prior this was accomplished using separate INSERT and UPDATE statements. This involved checking if key column exists to perform update and insert if not, or attempt an update first and then if not rows were affected perform insert. Not to mention if we have to check if account is missing from the branch office data and needs to be deleted from the central accounts table. That way we end up with multiple (sometimes complex) statements to implement one transaction, accessing both source and target tables multiple times.

SQL Server 2008 offers a lot more elegant way using the MERGE statement (MERGE is supported by ANSI SQL). It makes data merging very simple and elegant, as well as efficient.

Here is the base syntax for MERGE:

[ WITH <common_table_expression> [,...n] ]

MERGE

[ TOP ( expression ) [ PERCENT ] ]

[ INTO ] target_table [ [ AS ] table_alias ]

        [ WITH ( <merge_hint> ) ]

USING <table_source>

ON <search_condition>

[ WHEN MATCHED [ AND <search_condition> ]

        THEN <merge_matched> ]

[ WHEN [TARGET] NOT MATCHED [ AND <search_condition> ]

        THEN <merge_not_matched> ]

[ WHEN SOURCE NOT MATCHED [ AND <search_condition> ]

        THEN <merge_matched> ]

<output_clause>

[ OPTION ( <query_hint> [ ,...n ] ) ]   

;



Below is the initial setup for the source and target tables.

-- Static table as target for merging data

CREATE TABLE CentralOfficeAccounts (

 account_nbr INT PRIMARY KEY,

 company_name VARCHAR(35),

 primary_contact VARCHAR(35),

 contact_phone VARCHAR(12));

 

-- Dynamic table with daily updates to be merged

CREATE TABLE BranchOfficeAccounts (

 account_nbr INT PRIMARY KEY,

 company_name VARCHAR(35),

 primary_contact VARCHAR(35),

 contact_phone VARCHAR(12));

 

-- Sample centrall office static data

INSERT INTO CentralOfficeAccounts

VALUES (1, 'Bayside Motors', 'Jim Smith', '902-203-1234'),

       (2, 'Dallas Industries', 'Joe Doe', '301-663-9134'),

       (3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053');

 

-- Daily updated branch office data

INSERT INTO BranchOfficeAccounts

VALUES (2, 'Dallas Industries, Inc.', 'Rick Gross', '301-663-9134'), -- changed

       (3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053'),    -- same

       (4, 'Auto Insurance Co.', 'Chris Jefferson', '313-601-6201'); -- new



Here is how MERGE can be used to perform an update on existing accounts and insert of new accounts.

-- Update existing and add missing

MERGE INTO CentralOfficeAccounts AS C    -- Target

USING BranchOfficeAccounts AS B          -- Source

   ON C.account_nbr = B.account_nbr

WHEN MATCHED THEN                        -- On match update

    UPDATE SET C.company_name = B.company_name,

                C.primary_contact = B.primary_contact,

                C.contact_phone = B.contact_phone

WHEN NOT MATCHED THEN                    -- Add missing

    INSERT (account_nbr, company_name, primary_contact, contact_phone)

    VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone);

 

SELECT account_nbr, company_name, primary_contact, contact_phone

FROM CentralOfficeAccounts;

 

/*

 

-- Results:

 

account_nbr company_name             primary_contact contact_phone

----------- ------------------------ ---------------- -------------

1           Bayside Motors           Jim Smith        902-203-1234

2           Dallas Industries, Inc. Rick Gross       301-663-9134

3           Sky Computer Systems     Jane Brown       201-943-6053

4           Auto Insurance Co.       Chris Jefferson 313-601-6201

 

*/



Next step is to enhance the statement including predicates to check and update only accounts that have changed. The only difference here is the additional conditions in the MATCHED clause, results will be the same as before.

-- Update existing that changed and add missing

-- Use of predicates

MERGE INTO CentralOfficeAccounts AS C   -- Target

USING BranchOfficeAccounts AS B         -- Source

   ON C.account_nbr = B.account_nbr

WHEN MATCHED                            -- On match update

 AND (C.company_name <> B.company_name  -- Additional search conditions

   OR C.primary_contact <> B.primary_contact

   OR C.contact_phone <> B.contact_phone) THEN                       

    UPDATE SET C.company_name = B.company_name,

                C.primary_contact = B.primary_contact,

                C.contact_phone = B.contact_phone

WHEN NOT MATCHED THEN                   -- Add missing

    INSERT (account_nbr, company_name, primary_contact, contact_phone)

    VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone);



Going one step further, let's add the option to delete accounts that do not exist in the source table from the target table.

--Update existing that changed and add missing, delete missing in source

MERGE INTO CentralOfficeAccounts AS C   -- Target

USING BranchOfficeAccounts AS B         -- Source

   ON C.account_nbr = B.account_nbr

WHEN MATCHED                            -- On match update

 AND (C.company_name <> B.company_name  -- Additional search conditions

   OR C.primary_contact <> B.primary_contact

   OR C.contact_phone <> B.contact_phone) THEN                       

    UPDATE SET C.company_name = B.company_name,

                C.primary_contact = B.primary_contact,

                C.contact_phone = B.contact_phone

WHEN NOT MATCHED THEN                   -- Add missing

    INSERT (account_nbr, company_name, primary_contact, contact_phone)

    VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone)

WHEN SOURCE NOT MATCHED THEN            -- Delete missing from source

    DELETE;

 

SELECT account_nbr, company_name, primary_contact, contact_phone

FROM CentralOfficeAccounts;

 

/*

 

-- Results:

 

account_nbr company_name             primary_contact contact_phone

----------- ------------------------ ---------------- -------------

2           Dallas Industries, Inc. Rick Gross       301-663-9134

3           Sky Computer Systems     Jane Brown       201-943-6053

4           Auto Insurance Co.       Chris Jefferson 313-601-6201

 

*/



There is a lot more to MERGE, but this just shows the tremendous power it provides for merging data.

Note:

This code has been tested with SQL Server 2008 CTP 6 (February 2008). As of SQL Server 2008 Release Candidate 0 (June 2008) the clause "WHEN SOURCE NOT MATCHED" has been replaced with "WHEN NOT MATCHED BY SOURCE". This makes the wording clearer. It is good to note that this clause of MERGE is not standard.

References:

MERGE (Transact-SQL)
http://technet.microsoft.com/en-us/library/bb510625(SQL.100).aspx

3 comments:

Mohammad said...

Should we add predicates in MATCHED clause or we can add them in ON clause (in improved version of statement)? or is any difference between them?

Plamen Ratchev said...

There is big difference between predicates in the ON clause and predicates in the MATCHED clause. The predicates in the ON clause define what is matched and what is not (that is which WHEN section will be executed). The predicates on the MATCHED clause simply filter the rows that will be affected by the action statement in the clause. Also, you can add predicates in the source/target tables (if you use derived tables or CTEs), which in effect will limit the sets before they are matched. Using the correct predicate placement depends on what you are trying to accomplish.

Anonymous said...

i was not getting the answer when i was using "when source not matched then delete;"

atlast i found your blog and found the answer and now i know that i have to use "WHEN NOT MATCHED BY SOURCE"

thanks