Here are a couple examples to demonstrate using aggregate window functions with the OVER clause.
CREATE TABLE Loans (
loan_nbr INT NOT NULL PRIMARY KEY,
loan_date DATETIME NOT NULL,
loan_amount DECIMAL(12, 2) NOT NULL
DEFAULT 0.0,
customer_nbr INT NOT NULL,
loan_type CHAR(1) NOT NULL
DEFAULT 'P'
CHECK (loan_type IN ('P', -- personal
'B')) -- business
);
INSERT INTO Loans
VALUES (1, '20080801', 12000.00, 2, 'P'),
(2, '20080805', 15700.00, 1, 'B'),
(3, '20080610', 12000.00, 3, 'P'),
(4, '20080401', 5000.00, 1, 'P'),
(5, '20080715', 25000.00, 4, 'B'),
(6, '20080610', 25000.00, 5, 'P'),
(7, '20080501', 1000.00, 6, 'P'),
(8, '20080810', 6000.00, 7, 'B'),
(9, '20080815', 2000.00, 8, 'B'),
(10, '20080815', 1000.00, 9, 'P'),
(11, '20080715', 5500.00, 10, 'P'),
(12, '20080615', 1000.00, 11, 'B'),
(13, '20080820', 6000.00, 12, 'P'),
(14, '20080510', 28000.00, 6, 'B'),
(15, '20080815', 2000.00, 10, 'P'),
(16, '20080810', 1500.00, 8, 'P'),
(17, '20080817', 10000.00, 10, 'B'),
(18, '20080816', 2500.00, 9, 'P');
-- Aggregate window functions with the OVER clause
SELECT loan_nbr, loan_type, loan_amount, customer_nbr,
SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS total_customer_loans,
AVG(loan_amount) OVER(PARTITION BY customer_nbr) AS avg_customer_loan_amt,
MAX(loan_amount) OVER(PARTITION BY customer_nbr) AS max_customer_loan_amt,
MIN(loan_amount) OVER(PARTITION BY customer_nbr) AS min_customer_loan_amt,
COUNT(*) OVER(PARTITION BY customer_nbr) AS count_customer_loans
FROM Loans;
-- Calculate percent for current loan based on total customer loans
-- and total of all loans
SELECT loan_nbr, loan_type, loan_amount, customer_nbr,
loan_amount /
SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS percent_of_customer_loans,
loan_amount /
SUM(loan_amount) OVER() AS percent_of_all_loans
FROM Loans;
-- Get customers (and all their loans) with more than 2 loans
SELECT customer_nbr, loan_nbr, loan_amount, cnt
FROM (SELECT customer_nbr, loan_nbr, loan_amount,
COUNT(*) OVER(PARTITION BY customer_nbr) AS cnt
FROM Loans) AS L
WHERE cnt >= 2;
Resources:
OVER Clause
http://msdn.microsoft.com/en-us/library/ms189461.aspx
This is a great article!
ReplyDeleteI have, however, another question (and maybe it is a material for a whole new post). How would you write a query which will return an aggregate data for every 3 consequent records per loan type? For example: in your table you have loans 1,3,4 as personal, then 6,7,10 and so on. The question is how to show an output which shows the average loan amount per every group of 3 consequent loans, according to their type. (The query should display the data of the entire table, records grouped by 3, and if the last group has less than 3, then still show the average of 2 or just one record.)
This is very easy to accomplish. Here is a query to demonstrate:
ReplyDeleteSELECT loan_type, grp, AVG(loan_amount) AS avg_grp
FROM (
SELECT loan_nbr, loan_type, loan_amount, customer_nbr,
(ROW_NUMBER() OVER(PARTITION BY loan_type ORDER BY loan_nbr) - 1) / 3 AS grp
FROM Loans) AS L
GROUP BY loan_type, grp;
Essentially you use ROW_NUMBER partitioned by loan type and by integer division by 3 you create the groups of loans.
What is the advantage of this method (aggregate window function with OVER() clause) against the following method? performance? less complexity? shorter code?
ReplyDelete-- Aggregate window functions with the OVER clause
SELECT loan_nbr, loan_type, loan_amount, customer_nbr,
SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS total_customer_loans,
AVG(loan_amount) OVER(PARTITION BY customer_nbr) AS avg_customer_loan_amt,
MAX(loan_amount) OVER(PARTITION BY customer_nbr) AS max_customer_loan_amt,
MIN(loan_amount) OVER(PARTITION BY customer_nbr) AS min_customer_loan_amt,
COUNT(*) OVER(PARTITION BY customer_nbr) AS count_customer_loans
FROM Loans
-- without aggregate window function
select loan_nbr, loan_type, loan_amount, V.customer_nbr,
D.total_customer_loans,
D.avg_customer_loan_amt,
D.max_customer_loan_amt,
D.min_customer_loan_amt,
D.count_customer_loans
from
(
select customer_nbr,
SUM(loan_amount) as total_customer_loans,
AVG(loan_amount) AS avg_customer_loan_amt,
MAX(loan_amount) AS max_customer_loan_amt,
MIN(loan_amount) AS min_customer_loan_amt,
COUNT(*) AS count_customer_loans
from loans
group by customer_nbr
)D
join
loans v
on d.customer_nbr=v.customer_nbr
Thanks.
Mohammad,
ReplyDeleteThe benefits of using the OVER clause are all of that: simple, more efficient, shorter and maintainable code. I do not see a reason to use a join with derived table with aggregates unless you are using SQL Server 2000.
>>
ReplyDeletemore efficient
<<
More efficient?! But I do not see the difference between two previous queries.
See:
/*
-- Aggregate window functions with the OVER clause
Table 'Worktable'. Scan count 3, logical reads 85
Table 'Loans'. Scan count 1, logical reads 2
-- without aggregate window function
Table 'Worktable'. Scan count 3, logical reads 85
Table 'Loans'. Scan count 1, logical reads 2
*/
This is because the query optimizer is smart and generates almost the same execution plan in this case. But with different data, multiple tables, indexes, statistics, etc., it can differ.
ReplyDelete