Monday, October 20, 2008

Predicates in SQL

One of the most confusing topics for developers coming from traditional languages like C, C++, Java, C#, and VB to SQL, is the evaluation of predicates. In most languages predicate logic is evaluated left to right. SQL is a different story. It follows the concept of all-at-once operations. This means all logical predicates will be evaluated at the same time. In practice, the SQL engine is free to process the logical expressions in any order it finds appropriate and cost effective to retrieve the data.

Below is example to illustrate this behavior. Given a table with account policies, where a policy may be represented only as numeric values for certain accounts, and a mix of alpha-numeric characters for other accounts. Knowing that account 2 has policies that can be represented only as numeric values, the goal is to retrieve policies that are greater than 50000.

An attempt to add a first predicate to filter on account 2 and then a second one to select policies greater than 50000 will fail. Even if the query is formed with a derived table to extract first all policies for account 2 and then the outer query filters on the policy, it will not work. The derived table is expanded in the main query plan and a single query plan is produced. Then the query engine is free to push up and down the predicates in the plan as it finds efficient.

CREATE TABLE AccountPolicies (

 account_nbr INT,

 policy VARCHAR(20),

 PRIMARY KEY (account_nbr, policy));

 

INSERT INTO AccountPolicies VALUES(1, 'P1000234');

INSERT INTO AccountPolicies VALUES(1, 'P1020256');

INSERT INTO AccountPolicies VALUES(2, '1001');

INSERT INTO AccountPolicies VALUES(2, '5002');

INSERT INTO AccountPolicies VALUES(2, '50001');

INSERT INTO AccountPolicies VALUES(2, '50005');

INSERT INTO AccountPolicies VALUES(2, 'P50005');

 

-- Second predicate may be evaluated first

-- resulting in conversion error

SELECT account_nbr, policy

FROM AccountPolicies

WHERE account_nbr = 2

  AND CAST(policy AS INT) > 50000;

 

-- The derived table is expended in the

-- query plan and predicates can be pushed

-- up or down in the plan

SELECT account_nbr, policy

FROM (SELECT account_nbr, policy

      FROM AccountPolicies

      WHERE account_nbr = 2) AS P

WHERE CAST(policy AS INT) > 50000;



One way to solve this is to use a CASE expression to force the desired evaluation because WHEN clauses in CASE are evaluated in order. This solution not only demonstrates how to handle the evaluation process, but adds verification that only policies with numeric values are casted (as usual business rules change and later account 2 may be allowed to create policies with alpha-numeric characters).

-- Enforce sequence of evaluating conditions

-- and cast only valid values

SELECT account_nbr, policy

FROM AccountPolicies

WHERE CASE

        WHEN account_nbr = 2

        THEN CASE

               WHEN policy NOT LIKE '%[^0-9]%'

               THEN CASE

                      WHEN CAST(policy AS INT) > 50000

                      THEN 'True'

                      ELSE 'False'

                    END

               ELSE 'False'

            END            

        ELSE 'False'

      END = 'True';

9 comments:

Anonymous said...

Is the following code incorrect

;WITH CTETest AS
(
SELECT account_nbr, policy FROM AccountPolicies WHERE account_nbr = 2
)

SELECT account_nbr, policy
FROM CTETest
WHERE CAST(policy AS INT) > 50000;

Plamen Ratchev said...

Yes, this code the same problem as the derived table example. Here in a similar manner the CTE is expanded in the main query plan and a single query plan is produced. Then the optimizer may chose to evaluate first the predicate CAST(policy AS INT) > 50000 which will result in error.

Vijay Chand said...

Sir,

The only reason why you are getting error is because of this --> INSERT INTO AccountPolicies VALUES(2, 'P50005');

The account_nbr=2 but policy is alphnumeric. Explicit cast of alphanumeric to numeric is less forgiving than the implicit cast that sql engine performs.

Also, try using both the queries without the explicit Cast(policy as int) function as sql has an implicit cast function.

On broader perspective, there is a logic to the way sql engine processes predicates. This logic is determinate logic. Without this data retrieval would be a hell-of-a-task.

Cheers!

Vijay Chand

Plamen Ratchev said...

Hello Vijay,

The INSERT statement is correct. This is the whole point, to illustrate how the SQL Server engine is treating predicates.

Mohammad said...

Hi,
>>
WHEN clauses in CASE are evaluated in order
<<

Is IF statement same with CASE for evaluating? Or optimizer make a decision for trying which Boolean statement first evaluate (in terms of cost-based)

Plamen Ratchev said...

IF is a control of flow statement and not an expression like CASE. As such it cannot be used in queries and it is not part of the execution plan, so evaluation is different.

Kevin Fairchild said...

Wouldn't it be more readable to make it "WHEN policy LIKE '%[0-9]%'" rather than "WHEN policy NOT LIKE '%[^0-9]%'"?

Plamen Ratchev said...

Kevin,

The double negative guarantees that only numeric characters are allowed (in a way saying the value is not like not a number). If you use the LIKE predicate as you posted then it allows non numeric characters.

Anonymous said...

SELECT account_nbr, policy FROM AccountPolicies
WHERE account_nbr = 2
and ISNUMERIC(policy) = 1
AND policy > 50000;