使用4中不同的方式找出连续三年获奖的人

其实这个问题就是数据中的“island”孤岛数据的问题,只是要求连续长度是3。首先避开这个问题不谈,看看下来的数字:

1 2 3 5 7 8 9 11 13 18 22 27 28 29 30 40 100 102 107 108 109

现在有如下需求:找出连续的数字开始和结束的位置。

很显然,比划比划这个结果就是:

1    3
7    9
27    30
107    109

那么现在寻找一种方式,把这个结果用sql查出来。首先造数据:

;WITH Num(N) AS(
	SELECT 1 UNION ALL
	SELECT 2 UNION ALL
	SELECT 3 UNION ALL
	SELECT 5 UNION ALL
	SELECT 7 UNION ALL
	SELECT 8 UNION ALL
	SELECT 9 UNION ALL
	SELECT 11 UNION ALL
	SELECT 13 UNION ALL
	SELECT 18 UNION ALL
	SELECT 22 UNION ALL
	SELECT 27 UNION ALL
	SELECT 28 UNION ALL
	SELECT 29 UNION ALL
	SELECT 30 UNION ALL
	SELECT 40 UNION ALL
	SELECT 100 UNION ALL
	SELECT 102 UNION ALL
	SELECT 107 UNION ALL
	SELECT 108 UNION ALL
	SELECT 109)

先把这些数字当中连续的边界找出来(1,3,5,7,9,11,13,18,22,27,30,40,100,102,107,109)。查找的条件如下:不存在比当前小的数或者比当前大的数。如下:

cte1(N) AS(		
	SELECT N
	FROM Num n1
	WHERE NOT EXISTS(
			SELECT 'x'
			FROM Num n2
			WHERE n2.N = n1.N - 1)
		OR
		NOT EXISTS(
			SELECT 'x'
			FROM Num n3
			WHERE n3.N = n1.N + 1))

既然找到了边界,接下来我对两个边界进行配对,只不过这对值的第二值是比第一边界值大但是最小的那个。

SELECT N, (
		SELECT MIN(N)
		FROM cte1 c2
		WHERE c2.N > c1.N
		) Big
	FROM cte1 c1

配对成功,接下了我只需要找存在配对当中的行就可以了。如下:

SELECT *, Big - N  + 1 AS count
FROM(
	SELECT N, (
		SELECT MIN(N)
		FROM cte1 c2
		WHERE c2.N > c1.N
		) Big
	FROM cte1 c1)
 AS dr1
WHERE EXISTS(
	SELECT 'x'
	FROM Num c1
	WHERE c1.N > dr1.N AND c1.N < dr1.Big
)

这样就得到想要的结果了。现在回过来看看“连续3年获奖的人”,用一样的原理可以得到第一种方式(最慢的方式)。

首先造点需要的数据:

;WITH Prize(Name, Year) AS(
	SELECT 'Kim', '2001' UNION ALL
	SELECT 'Tim', '2002' UNION ALL
	SELECT 'Tang', '2002' UNION ALL
	SELECT 'Jack', '2001' UNION ALL
	SELECT 'Juicy', '2001' UNION ALL
	SELECT 'Peff', '2002' UNION All
	SELECT 'Juicy', '2002' UNION ALL
	SELECT 'Kim', '2002' UNION ALL
	SELECT 'Juicy', '2003' UNION ALL
	SELECT 'Peff', '2003' UNION ALL
	SELECT 'Kim', '2003' UNION ALL
	SELECT 'Tim', '2004' UNION ALL
	SELECT 'Juicy', '2004' UNION ALL
	SELECT 'Jack', '2005' UNION ALL
	SELECT 'Eric', '2005' UNION ALL
	SELECT 'Tim', '2005' UNION ALL
	SELECT 'Eric', '2006' UNION ALL
	SELECT 'Peff', '2006' UNION ALL
	SELECT 'Juicy', '2007' UNION ALL
	SELECT 'Eric', '2007' UNION ALL
	SELECT 'Tang', '2007' UNION ALL
	SELECT 'Tang', '2008' UNION ALL
	SELECT 'Peff', '2008' UNION ALL
	SELECT 'Kim', '2008' UNION ALL
	SELECT 'Jack', '2009' UNION ALL
	SELECT 'Tang', '2009' UNION ALL
	SELECT 'Kim', '2009')
第一种方式为:
,cte1 AS(
SELECT * 
FROM Prize p1
WHERE NOT EXISTS(
	SELECT 'x'
	FROM Prize p2
	WHERE p1.Name = p2.Name AND p2.Year = p1.Year - 1) 
	OR
	NOT EXISTS(
	SELECT 'x'
	FROM Prize p3
	WHERE p3.Name = p1.Name AND p3.Year = p1.Year + 1)
)

SELECT *
FROM (
	SELECT Name, Year, (
		SELECT MIN(Year)
		FROM cte1 c2
		WHERE c2.Name = c1.Name AND c2.Year > c1.Year
	) AS Bigger 
	FROM cte1 c1
	--ORDER BY Name, Year
) AS Dr1
WHERE EXISTS(
	SELECT 'x'
	FROM Prize p
	WHERE p.Name = Dr1.Name AND p.Year > Dr1.Year AND p.Year < Dr1.Bigger
) 
AND CAST(Bigger AS INT) - CAST(Year AS INT) + 1 >= 3
ORDER BY Name, Year
第二种方式,凡是连续三年获奖的人,那么year上一定存在三个连续的值,那么我就用两个exists条件去查询,如下:
	SELECT DISTINCT Name
	FROM Prize p1
	WHERE EXISTS(
		SELECT 'x'
		FROM Prize p2
		WHERE p2.Name = p1.Name AND p2.Year = p1.Year + 1)
		AND EXISTS(
		SELECT 'x'
		FROM Prize p3
		WHERE p3.Name = p1.Name AND p3.Year = p1.Year + 2)
第三种方式,这种方式从《Microsoft SQL Server 2005技术内幕:T-SQL查询》中需来,博客园有篇文章对此总结,不再赘述。
SELECT DISTINCT Name
FROM (
	SELECT  *, (SELECT MIN(p2.Year)
		FROM Prize p2
		WHERE p2.Name = p.Name and p2.Year >= p.Year AND
			NOT EXISTS(
				SElECT 'x'
				FROM Prize p3
				WHERE p3.Name = p2.Name AND p3.Year = p2.Year + 1)) Big
	FROM Prize p
	) AS Dr
WHERE CAST(Dr.Big AS INT) - CAST(Dr.Year AS INT) + 1 >= 3
第四种方式,如果我按照‘年’对表进行旋转,会得到什么结果?首先试试:
	SELECT * 
	FROM Prize P PIVOT(
		COUNT(Year) FOR Year IN( [2001], [2002], [2003], [2004], [2005], [2006], [2007], [2008], [2009])
		) AS PD

那么我可以得到结果:

_thumb3

既然有了个结果,接下来就好处理了,我把三个连续的年列组成一组,凡是都为1的一定是3年都获奖的人。如下:
SELECT Name
FROM (
	SELECT * 
	FROM Prize P PIVOT(
		COUNT(Year) FOR Year IN( [2001], [2002], [2003], [2004], [2005], [2006], [2007], [2008], [2009])
		) AS PD
) AS D
WHERE (D.[2001] = 1 AND D.[2002] = 1 AND D.[2003] = 1) OR
	  (D.[2002] = 1 AND D.[2003] = 1 AND D.[2004] = 1) OR
	  (D.[2003] = 1 AND D.[2004] = 1 AND D.[2005] = 1) OR
	  (D.[2004] = 1 AND D.[2005] = 1 AND D.[2006] = 1) OR
	  (D.[2005] = 1 AND D.[2006] = 1 AND D.[2007] = 1) OR
	  (D.[2006] = 1 AND D.[2007] = 1 AND D.[2008] = 1) OR
	  (D.[2007] = 1 AND D.[2008] = 1 AND D.[2009] = 1)
这种方式是非常快的,比以上任何方式都要快,但是有它的局限性,以上的[2001]……[2009]全部都是hard的。如过表中的年很多,需要在IN后面加上年份,然后在WHERE条件
中进行连续3年的组合,如果表中有n个年份,那么where条件中一种有n-2个条件。有时为了效率,多写点代码是值得的。
原文地址:https://www.cnblogs.com/fgynew/p/1671575.html