QUOTED_IDENTIFIER vs. PERSISTED column

Sometimes you are working on an issue for days and when you finally find a “solution” to the issue then you start wondering how that works in the background. Because you do not really understand that, you starts becoming frustrated and asking yourself: is it really something I can consider as a permanent solution to my problem?! Or is it a temporary one and I will need to return to that in the future?

Recently I was working on a performance degradation problem and the lesson I have learnt: always use QUOTED_IDENTIFIER ON if you run queries against a table which has PERSISTED column. Otherwise, the fact that the table has persisted column might be ignored by SQL Server while evaluating execution plans.

Persisted database column is a computed column physically stored in the table but specified via a deterministic expression calculated over the values of other database columns in the same table.

The following T-SQL script creates a table where TotalIncome is a persisted column and calculated as UnitPrice*UnitsSold.

CREATE TABLE [dbo].[Sales] (
[ProductName] [NVARCHAR] (256) NOT NULL,
[UnitPrice] [int] NOT NULL,
[UnitsSold] [int] NOT NULL,
[TotalIncome] AS UnitPrice * UnitsSold PERSISTED
) ON [PRIMARY]

GO

The T-SQL script you can find at the end of this post creates one million record in the table by generating products, price and units randomly. If you launch SQL Server Management Studio, you connect to the database and execute the following query against the Sales table

SET QUOTED_IDENTIFIER ON
GO

SELECT COUNT(*) FROM Sales WHERE TotalIncome > 800000

then the actual execution plan SQL Server used is the one you have originally expected: computed column TotalIncome is used as predicate while performing table scan.

quoted_on_predicate.png

However, if QUOTED_IDENTIFIER is OFF from some reason then actual execution plan looks like this: it ignores the persisted column and includes a scalar operation instead to evaluate the expression the computed column is based on. Of course, this useless calculation has performance implication (0.2 secs in our test environment; SQL Server 2012, Windows Server 2012, 2.3 GHz Intel CPU, 2 GB RAM).

quoted_off_predicate.png

You might just wondering how the hell QUOTED_IDENTIFIER (which specifies whether quoted identifiers are supported or not; and how string literals are delimited) has any relationship to the fact that tables have persisted columns or not … Honestly, I have no idea …

But, the rule of thumb is the following: always use QUOTED_IDENTIFIER ON if you run queries against tables which have persisted columns. Depending on your scenario you might need to set that either on batch, database connection, database object or database level. The important thing is to have effective QUOTED_IDENTIFIER ON when execution goes to the level where SQL Server query optimizer comes into the scene to build up execution plans for the query.

Script to generate random data

DECLARE @MinPrice INTEGER = 10
DECLARE @MaxPrice INTEGER = 10000

DECLARE @MinUnits INTEGER = 1
DECLARE @MaxUnits INTEGER = 100

DECLARE @MaxNumOfRecords INTEGER = 1000000
DECLARE @NumOfRecords INTEGER = 0
DECLARE @Price INTEGER = 0

WHILE @NumOfRecords < @MaxNumOfRecords
BEGIN
SET @Price = CAST((@MaxPrice - @MinPrice + 1) * RAND() AS integer) + @MinPrice

INSERT INTO Sales(ProductName, UnitPrice, UnitsSold) VALUES(
'MyProduct' + CAST(@Price AS [NVARCHAR] (256)), @Price,
CAST((@MaxUnits - @MinUnits + 1) * RAND() AS integer) + @MinUnits)

SET @NumOfRecords = @NumOfRecords + 1
END