MySQL实战总结

一、求每个分组内前五条记录

这是今天美团优选数据研发岗的其中一道面试题记录。

有一张student_subject表。记录了每个学生的课程的分数

name subject score
张三 语文 84
李四 数学 28
王五 英语 67
张三 数学 96
李四 语文 14
王五 语文 90
... ... ...

(1) 求出每门课程的最高的分数

这个解决办法很简单, 直接分组取分组内最大值即可。

SELECT subject, max(score) AS max_score
FROM student_subject
GROUP BY subject
ORDER BY subject;

(2) 求出每门课程分数的前五名

子查询

额外增加一列表示分组内的排名。求分组内排名可以采用子查询的方式。查询条件为类型相同的, 找score大于当前值的记录总数, 在其基础上加一就是排名

SELECT t3.*
FROM (
	SELECT t2.*
		, (
		-- 查询比当前score(t2.score)(子查询, 将此t2.score看做常量)更大的记录数, 
		-- 在此基础上加一就是其排名
			SELECT count(*) + 1
			FROM student_subject t1
			WHERE t1.subject = t2.subject
				AND t1.score > t2.score
		) AS top
	FROM student_subject t2
) t3
WHERE top <= 5
-- 按科目、分数顺序排序, 
ORDER BY t3.subject, top;

还可以换一种写法如下, 但是依旧是子查询

SELECT t2.*
FROM student_subject t2
WHERE (
	SELECT count(*) + 1
	FROM student_subject t1
	WHERE t1.subject = t2.subject
		AND t1.score > t2.score
) <= 5
ORDER BY subject;

还可以修改为下一种写法

SELECT t2.*
FROM student_subject t2
WHERE EXISTS (
	SELECT count(*) + 1
	FROM student_subject t1
	WHERE t1.subject = t2.subject
		AND t1.score > t2.score
	
	HAVING count(*) + 1 <= 5
)
ORDER BY subject;

上面这三种写法基本相同。

左连接

SELECT t3.subject, t3.name, t3.score
FROM (
	SELECT t1.subject, t1.name, t1.score
	FROM student_subject t1
		LEFT JOIN student_subject t2
    	-- 左连接的条件是科目相同, 并且别人的分数不低于自己的分数(=可以让自己与自己左连接)
		ON t1.subject = t2.subject
			AND t1.score <= t2.score
) t3
-- 对每个科目的每个人做group by, count(*) 就是这个人的排名
GROUP BY t3.subject, t3.name, t3.score
-- 把排名前三的筛选出来
HAVING count(t3.name) <= 3
ORDER BY t3.subject, count(t3.name);

HIVE_SQL

Hive 里可以使用分析函数(窗口函数)

SELECT subject, name, score
FROM (
	SELECT subject, name, score, ROW_NUMBER() OVER (PARTITION BY settop ORDER BY score DESC) AS rn
	FROM student_subject
)
WHERE rn <= 5
ORDER BY subject;

二、多分组组内排序第一

这是面试字节抖音数仓团队的一道面试题。

有一张成绩表如下:

id u_id exam_id course_id score
1 20211013001 1 1 56
2 20211013001 2 1 64
3 20211013002 1 1 54
4 20211013001 1 2 87
5 20211013002 2 1 88
6 20211013001 2 2 55
7 20211013002 1 2 45
8 20211013002 2 2 99

求每个学生每一门课程的最高成绩。要求使用窗口函数实现。实现起来非常简单。如下:

SELECT t1.u_id, t1.course_id, t1.score
FROM (
	SELECT u_id, course_id, score, 
	ROW_NUMBER() OVER (PARTITION BY u_id, course_id ORDER BY score DESC) AS rt
	FROM `tb_score`
) t1
WHERE t1.rt = 1;

但是注意需要在外层再套一层SELECT, 不能直接一层SELECT ... WHERE rt = 1;

原文地址:https://www.cnblogs.com/chenrj97/p/14583547.html