Keeping History Data in SQL Server

Very often there is the need to archive data on daily basis. Just had that question today and here is trimmed down solution based on a recent project.

The task is to keep history data on daily basis of changed data only (that is only rows that have changed for the last day). The scenario is a Loan table with loan number and loan amount. If there is change in any of the columns in the table, that change needs to be inserted into a history table. Further, the history table should be able to provide a convenient way to query the latest data (most recent updated row) for reporting purposes.

Here is the code to that solution:

-- Create sample tables.

 

CREATE TABLE Loans (

  loan_nbr INTEGER NOT NULL PRIMARY KEY,

  loan_amount DECIMAL(12, 2) DEFAULT 0.0 NOT NULL);

 

CREATE TABLE LoansHistory (

  loan_nbr INTEGER NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL,

  change_date DATETIME

              DEFAULT CAST(DATEDIFF(day, 0, CURRENT_TIMESTAMP) AS DATETIME)

              NOT NULL,

  PRIMARY KEY (loan_nbr, change_date),

  archive_date DATETIME NULL,

  CHECK (change_date < archive_date));

 

GO

 

-- Sample data.

 

INSERT INTO Loans VALUES (1, 100.00);

INSERT INTO Loans VALUES (2, 150.00);

INSERT INTO Loans VALUES (3, 120.00);

INSERT INTO Loans VALUES (4, 160.00);

 

INSERT INTO LoansHistory VALUES (1, 100.00, '20070501', NULL);

INSERT INTO LoansHistory VALUES (2, 95.00, '20070503', NULL);

INSERT INTO LoansHistory VALUES (3, 80.00, '20070506', '20070508');

INSERT INTO LoansHistory VALUES (3, 85.00, '20070508', NULL);

 

GO

 

-- View to use for latest loans.

 

CREATE VIEW CurrentLoans (loan_nbr, loan_amount)

AS

SELECT loan_nbr, loan_amount

FROM LoansHistory

WHERE archive_date IS NULL;

 

GO

 

-- Insert changed rows.

 

-- SQL Server 2005.

 

INSERT INTO LoansHistory (loan_nbr, loan_amount)

SELECT loan_nbr, loan_amount

FROM Loans

EXCEPT

SELECT loan_nbr, loan_amount

FROM CurrentLoans;

 

-- SQL Server 2000.

 

INSERT INTO LoansHistory (loan_nbr, loan_amount)

SELECT loan_nbr, loan_amount

FROM ( SELECT 'Live' AS Source,

              loan_nbr,

              loan_amount 

       FROM Loans 

       UNION ALL

       SELECT 'History' AS Source,

              loan_nbr,

              loan_amount 

       FROM CurrentLoans ) AS L

GROUP BY loan_nbr, loan_amount

HAVING COUNT(*) = 1

   AND MIN(Source) = 'Live';

 

-- Update archive date for old history.

-- Runs in the same transaction as the insert of new history.

 

UPDATE LoansHistory

SET archive_date = CAST(DATEDIFF(day, 0, CURRENT_TIMESTAMP) AS DATETIME)

WHERE archive_date IS NULL

  AND EXISTS (SELECT *

              FROM LoansHistory AS H

              WHERE H.loan_nbr = LoansHistory.loan_nbr

                AND H.change_date > LoansHistory.change_date);

 

GO

 

-- Use the view for all reporting purposes.

 

SELECT loan_nbr, loan_amount

FROM CurrentLoans

ORDER BY loan_nbr;

 

-- Loans as of 2007-05-08.

 

SELECT loan_nbr, loan_amount

FROM LoansHistory

WHERE change_date <= '20070508'

  AND COALESCE(archive_date, CURRENT_TIMESTAMP) > '20070508';



The process is fairly simple with keeping two dates: change date for when the last change occurred (and that is the date when the row was inserted in history), and archive date which indicates that this row is archived and there is another more recent change. If the archive data is NULL it indicates that this is the most recent change. To simplify the querying process both dates have time portion set to midnight.

Labels: , ,