万圣节问题（Halloween Protection）

万圣节效应指的是结果集中数据移动位置并因此被改变多次。这个效应不同于双读，因为它是有数据修改驱动的，而不是读取查询。要执行一个更行，数据必须先被读取。执行这个要使用两个游标，一个用于读取，另一个用于写入。如果数据在所有的数据读入之前被写入游标更行，那么就有可能出现某行移动位置，并再次被读取，从而再次被更新。理论上，这会永远持续下去。使用索引读取数据，该索引的键值（key）会被查询更新，这是万圣节效应的一个例子。万圣节效应显然非常让人不悦，幸好SQL Server的存储引擎使其免受该效应。要确保可用于写入的数据全部被读取，SQL Server需要向计划注入一个阻塞操作符，如spool。

下面是找到的一篇关于如何防止万圣节效应的文章：http://blogs.msdn.com/b/craigfr/archive/2008/02/27/halloween-protection.aspx。

还有这篇： https://www.simple-talk.com/sql/learn-sql-server/operator-of-the-week---spools,-eager-spool/

In a prior post, I introduced the notion that update plans consist of two parts: a read cursor that identifies the rows to be updated and a write cursor that actually performs the updates. Logically speaking, SQL Server must execute the read cursor and write cursor of an update plan in two separate steps or phases. To put it another way, the actual update of rows must not affect the selection of which rows to update. This problem of ensuring that the write cursor of an update plan does not affect the read cursor is known as "Halloween protection" as it wasdiscovered by IBM researchers more than 30 years ago on Halloween.

One simple solution to the Halloween problem is to physically separate the read and write cursors of an update plan using a blocking operator such as an eager spool or sort. Inserting a blocking operator between the two halves of an update plan ensures that the read cursor runs in its entirety and generates all rows that it will generate before the write cursor begins executing or modifying any rows. Unfortunately, inserting a blocking operator such as an eager spool into the update plan requires copying all rows output by the read cursor. This copying can be quite expensive. Fortunately, in many cases, SQL Server can determine that the write cursor will not affect the read cursor and does not need to add a blocking operator at all.

Let's look at an example:

CREATE TABLE T (PK INT, A INT)
CREATE UNIQUE CLUSTERED INDEX TPK ON T(PK)
CREATE INDEX TA ON T(A)

INSERT T VALUES (1, 1)
INSERT T VALUES (2, 2)
INSERT T VALUES (3, 3)

UPDATE T SET A = A + 10

Here is the plan for the update statement:

|--Clustered Index Update(OBJECT:([T].[TPK]), OBJECT:([T].[TA]), SET:([T].[A] = [Expr1003]))
       |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))
            |--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [Expr1004] THEN (1) ELSE (0) END))
                 |--Compute Scalar(DEFINE:([Expr1003]=[T].[A]+(10), [Expr1004]=CASE WHEN [T].[A] = ([T].[A]+(10)) THEN (1) ELSE (0) END))
                      |--Top(ROWCOUNT est 0)
                           |--Clustered Index Scan(OBJECT:([T].[TPK]))

In this plan, the clustered index scan is the read cursor while the clustered index update is the write cursor. Notice that this plan has no blocking operators. Because we are modifying column A which is not part of the clustered index key, SQL Server knows that rows in the clustered index will not move due to this update and, thus, knows that there is no need to separate the scan and update operators. Now let's use a hint to force SQL Server to scan the non-clustered index on column A:

UPDATE T SET A = A + 10 FROM T WITH (INDEX(TA))

Here is the new update plan:

|--Clustered Index Update(OBJECT:([T].[TPK]), OBJECT:([T].[TA]), SET:([T].[A] = [Expr1003]))
       |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))
            |--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [Expr1004] THEN (1) ELSE (0) END))
                 |--Top(ROWCOUNT est 0)
                      |--Compute Scalar(DEFINE:([Expr1003]=[T].[A]+(10), [Expr1004]=CASE WHEN [T].[A] = ([T].[A]+(10)) THEN (1) ELSE (0) END))
                           |--Table Spool
                                |--Index Scan(OBJECT:([T].[TA]), ORDERED FORWARD)

Notice that this plan does include a blocking operator - specifically an eager table spool. (The text plan as generated by SHOWPLAN_TEXT does not show that the spool is eager, but SHOWPLAN_ALL as well as the graphical and XML plans do indicate that the spool is eager.) This time SQL Server recognizes that updating column A could cause rows to move within the index on column A which could cause the scan to return these rows more than once which would in turn lead to the plan updating the same rows more than once. The spool ensures that we get the correct result by saving a copy of the scan output before updating any rows.

There is no way to get SQL Server to omit the spool from the above plan as this would lead to incorrect results. However, we can simulate what would happen by using dynamic cursors. The following batch creates a dynamic cursor to scan index TA and then updates each row before fetching the next row. Because updates are immediately visible to dynamic cursors, this batch yields the same result as the above plan would yield if we could remove the spool. Note that I am not suggesting that anyone should implement an update this way and, if anything, the following example nicely illustrates one of the pitfalls of dynamic cursors.

DECLARE @PK INT
DECLARE C CURSOR DYNAMIC SCROLL_LOCKS FOR SELECT PK FROM T WITH (INDEX(TA))
OPEN C
WHILE 0=0
BEGIN
      FETCH NEXT FROM C INTO @PK
       IF @@FETCH_STATUS <> 0
           BREAK
       UPDATE T SET A = A + 10 WHERE PK = @PK
   END
CLOSE C
DEALLOCATE C

If you execute this batch, it will enter an infinite loop as it repeatedly scans, updates, and then again scans the same three rows. On the other hand, the batch terminates properly if we change the index hint to use the clustered index (where no spool is required) or if we use a static cursor (which makes a copy of the table just like the spool):

DECLARE C CURSOR DYNAMIC SCROLL_LOCKS FOR SELECT PK FROM T WITH (INDEX(TPK))
DECLARE C CURSOR STATIC FOR SELECT PK FROM T WITH (INDEX(TA))

SQL Server can use any blocking operator, not just a spool, to provide Halloween protection. Normally, if an update plan requires Halloween protection, SQL Server adds a spool because the spool is the cheapest blocking operator. However, if an update plan already includes another blocking operator, SQL Server will not also add a spool. For example, if we update column PK which has a unique clustered index, we get a sort as a by-product ofmaintaining the unique index. Since this plan already has a sort, we do not also get a spool operator.

UPDATE T SET PK = PK + 10 FROM T WITH (INDEX(TPK))

|--Index Update(OBJECT:([T].[TA]), SET:([PK1015] = [T].[PK],[A1016] = [T].[A]))
       |--Split
            |--Clustered Index Update(OBJECT:([T].[TPK]), SET:([T].[PK] = [T].[PK],[T].[A] = [T].[A]))
                 |--Collapse(GROUP BY:([T].[PK]))
                      |--Sort(ORDER BY:([T].[PK] ASC, [Act1014] ASC))
                           |--Split
                                |--Top(ROWCOUNT est 0)
                                     |--Compute Scalar(DEFINE:([Expr1003]=[T].[PK]+(10)))
                                          |--Clustered Index Scan(OBJECT:([T].[TPK]))

As I mentioned above, adding a spool is not free and does increase the cost of an update plan. If we insert enough rows into the table, we can measure this effect. For example, I loaded the table with 100,000 rows as follows.

TRUNCATE TABLE T
SET NOCOUNT ON
DECLARE @I INT
SET @I = 0
WHILE @I < 100000
    BEGIN
        INSERT T VALUES (@I, @I)
        SET @I = @I + 1
    END
SET NOCOUNT OFF

I then used SET STATISTICS TIME ON to measure the time to run the first two update statements above. I ran each statement twice and reported the second time to ensure that the buffer pool was warmed up.

UPDATE T SET A = A + 10

SQL Server Execution Times:
CPU time = 3046 ms, elapsed time = 3517 ms.

UPDATE T SET A = A + 10 FROM T WITH (INDEX(TA))

SQL Server Execution Times:
CPU time = 4391 ms, elapsed time = 4666 ms.

As you can see, the same update took over 30% longer in elapsed time and over 40% longer in CPU time. I ran this experiment on a Pentium Xeon 2.2 GHz workstation with 2 GB of RAM, Windows Server 2003 SP2, and SQL Server 2005 SP2. Other system configurations may yield different results.

Finally, although I've used update statements for all of the examples in this post, some insert and delete statements also require Halloween protection, but I'll save that topic for a future post.

不要停下前进的脚步，哪怕每天进步一点点！