SQL Server 2012 ColumnStore索引测试

主要是和普通的索引进行对比：

/********************
    准备数据 
******************/
select *  into ColumnStoreTest from northwind..orders

declare @i int 
set @i = 12
while(@i > 0)
begin
    insert into ColumnStoreTest
        select * from ColumnStoreTest
        union all
        select * from ColumnStoreTest
    set @i = @i-1
end

--顺带提一下，因为 into 会把 identity 也写进去，为了方便 我就把ColumnStoreTest  的 identity 给散掉了

@i 用12 可能数据量有点多，可以自己调整

/**************************
创建columnstrore index
************************/

create index idx_CustomerID on ColumnStoreTest(CustomerID,Freight)


create columnstore index csidx_CustomerID on ColumnStoreTest(CustomerID,Freight)

这个是使用第一个索引测试产生的结果

SQL Server 分析和编译时间:
CPU 时间 = 0 毫秒，占用时间 = 5 毫秒。

(89 行受影响)
表 'ColumnStoreTest'。扫描计数 5，逻辑读取 7352 次，物理读取 0 次，预读 32 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。

(6 行受影响)

(1 行受影响)

 SQL Server 执行时间:
   CPU 时间 = 1529 毫秒，占用时间 = 544 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

执行计划也没什么特别的就是 普通的索引扫描
   select CustomerID,sum(Freight) from ColumnStoreTest group by CustomerID
  |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END))
       |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[ColumnStoreTest].[CustomerID]) DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007])))
            |--Parallelism(Gather Streams, ORDER BY:([Northwind].[dbo].[ColumnStoreTest].[CustomerID] ASC))
                 |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[ColumnStoreTest].[CustomerID]) DEFINE:([partialagg1005]=COUNT_BIG([Northwind].[dbo].[ColumnStoreTest].[Freight]), [partialagg1007]=SUM([Northwind].[dbo].[ColumnStoreTest].[Freight])))
                      |--Index Scan(OBJECT:([Northwind].[dbo].[ColumnStoreTest].[idx_CustomerID]), ORDERED FORWARD)

SQL Server 分析和编译时间: 
   CPU 时间 = 16 毫秒，占用时间 = 93 毫秒。

(89 行受影响)
表 'ColumnStoreTest'。扫描计数 4，逻辑读取 34 次，物理读取 2 次，预读 18 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
表 'Worktable'。扫描计数 0，逻辑读取 0 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。

(7 行受影响)

(1 行受影响)

 SQL Server 执行时间:
   CPU 时间 = 63 毫秒，占用时间 = 281 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

   select CustomerID,sum(Freight) from ColumnStoreTest group by CustomerID
  |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END))
       |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[ColumnStoreTest].[CustomerID]) DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007])))
            |--Sort(ORDER BY:([Northwind].[dbo].[ColumnStoreTest].[CustomerID] ASC))
                 |--Parallelism(Gather Streams)
                      |--Hash Match(Partial Aggregate, HASH:([Northwind].[dbo].[ColumnStoreTest].[CustomerID]), RESIDUAL:([Northwind].[dbo].[ColumnStoreTest].[CustomerID] = [Northwind].[dbo].[ColumnStoreTest].[CustomerID]) DEFINE:([partialagg1005]=COUNT_BIG([Northwind].[dbo].[ColumnStoreTest].[Freight]), [partialagg1007]=SUM([Northwind].[dbo].[ColumnStoreTest].[Freight])))
                           |--Index Scan(OBJECT:([Northwind].[dbo].[ColumnStoreTest].[csidx_CustomerID]))

可以从这2个结果中看出，逻辑读的数量columnstore index 明显比普通索引的少，这也就是 columnstore 索引的优势

但是如果是普通的select * from where 这类语句那columnstore index 还有优势嘛？

是不是和 oracle的bitmapindex 一样在 or 语句中也很有优势呢？

在columnstore index 状况下的执行计划没有一点优势：
因为大家对非聚集索引比较了解，我也就不发非聚集索引在这种状况下的执行计划了。
select * from ColumnStoreTest where customerid = 'VINET' or customerid = 'TOMSP'
  |--Parallelism(Gather Streams)
       |--Table Scan(OBJECT:([Northwind].[dbo].[ColumnStoreTest]), WHERE:([Northwind].[dbo].[ColumnStoreTest].[CustomerID]=N'TOMSP' OR [Northwind].[dbo].[ColumnStoreTest].[CustomerID]=N'VINET'))

都已经是表扫描了其实也没什么好说的了。

上面的例子是再选择性低的情况下的执行计划。

那么如果选择性高又会怎么样呢？

SQL Server 分析和编译时间: 
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。
SQL Server 分析和编译时间: 
   CPU 时间 = 16 毫秒，占用时间 = 28 毫秒。
SQL Server 分析和编译时间: 
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

(1 行受影响)
表 'ColumnStoreTest'。扫描计数 1，逻辑读取 12 次，物理读取 0 次，预读 2 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。

(4 行受影响)

(1 行受影响)

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 86 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。
   SELECT * FROM [ColumnStoreTest] WHERE [orderid]=@1
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))
       |--Index Scan(OBJECT:([Northwind].[dbo].[ColumnStoreTest].[csidx_orderID]),  WHERE:([Northwind].[dbo].[ColumnStoreTest].[OrderID]=(10248)))
       |--RID Lookup(OBJECT:([Northwind].[dbo].[ColumnStoreTest]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

SQL Server 分析和编译时间: 
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。
SQL Server 分析和编译时间: 
   CPU 时间 = 0 毫秒，占用时间 = 9 毫秒。
SQL Server 分析和编译时间: 
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

(1 行受影响)
表 'ColumnStoreTest'。扫描计数 1，逻辑读取 3 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。

(4 行受影响)

(1 行受影响)

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 92 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。

 SQL Server 执行时间:
   CPU 时间 = 0 毫秒，占用时间 = 0 毫秒。
   SELECT * FROM [ColumnStoreTest] WHERE [orderid]=@1
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))
       |--Index Seek(OBJECT:([Northwind].[dbo].[ColumnStoreTest].[idx_orderid]), SEEK:([Northwind].[dbo].[ColumnStoreTest].[OrderID]=(10248)) ORDERED FORWARD)
       |--RID Lookup(OBJECT:([Northwind].[dbo].[ColumnStoreTest]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

csidx_orderid 是columnstore index

idx_orderid 是非聚集索引

仔细比较逻辑读，就能看出，在高选择性，传统索引是比较又优势的。

关于or，理论上来说是columnstore index 比非聚集索引又优势。

因为我相信，columnstore index 是和bitmap index 相同原理的。

如果对bitmap index 不太了解可以参考：《expert oracle database architecture》中的相关章节