sql查询学习和实践点滴积累

https://blog.rjmetrics.com/2008/10/28/correlated-subqueries-in-mysql/

http://www.mysqltutorial.org/mysql-subquery/

SQL是关系数据库中非常基础同时也是非常重要的知识，虽然比如类似Laravel的后端开发类库提供了ORM抽象数据类封装掉了一部分简单的sql查询，因此很多时候我们无需关系sql的具体细节，便能非常快速地开发出自己的后端应用来，但是一旦涉及到相对比较复杂的关系时我们还是不得不再去求助于sql。本博作为鸡年新年刚过，起个开端，不断记录积累本人在sql学习中觉得重要的点点滴滴，学习资源，备忘的同时，也希望给有缘人以帮助

SQL Subselect and correlated subquery

subquery就是被括号所包围的一个被嵌入到另外一个SQL statement的statement。而包含这个subquery的statement我们通常称为outer query， subquery本身被成为inner query.

A sub query is a nested query where the results of one query can be used in another query via a relational operator or aggregation function

和subquery有关的规则：

1. 一个subquery必须出现在()括号中；

2. 你可以在一个query中嵌入另外一个subquery,这个层级没有限制；

3. 如果outer query对subquery期待（或者说引用）一个single value或者一系列value，那么subquery只能使用one expression or column name in its select list; (a subquery can have only one column in the select clause if used in where clause)

4. orderby clause不允许在subquery中出现

5. subquery可以被用在where, having, from和select clause中

select t1.* from table1 t1 where t1.id not in (select t2.id from table2 t2): non-corelated subquery

http://www.geeksengine.com/database/subquery/return-single-value.php

.在大多数情况下，我们可以非常轻松地以JOIN语句来完全实现一个子select语句的功能（或者相反）。但是很多时候，和join方法相比,subquery来得更加通俗易懂，符合逻辑。比如像IN, ANY这些关键字的使用就往往使得语句更易理解和容易被分解。作为一个例子，我们来看下面问题的query语句:

" 列出所有在NJ州的客人名单"

select Name from Customers where CustomerID = ANY ( select CustomerID from AddressBook where state = 'NJ')

在这个例子中，括号中的黑体部分就是一个subquery.

这个subquery被我们称做"Non-correlated" subquery,原因是你可以单独执行这条select语句来得到一个合乎逻辑的正确的结果集.在这个例子中，隔离的subquery可以产生一个来自NJ州的客户名单列表。

然而，相对于这种Non-correlated就有一种Correlated subquery,这种子查询包含着outer query的value的引用，因此无法脱离outer query而单独执行

我们举一个典型的查询例子：

select * from t1 where column1 = ANY ( SELECT column1 from t2 where t2.column2 = t1.column2)

注意在这个例子中，t1表虽然并未在subquery的from clause所指出但是却被subquery中的where clause所引用，t1表只存在于outer query语句中，如果你直接执行这个隔离的subquery，你将由于无法找到t1表而出错。

和他们的non-correlated subquery例子，correlated subquery不允许在from clause中出现。这个关键规则,在mysql refman文档中甚至没有提及，显著地降低了correlated subquery可以应用的范围。

那么，如果一个correlated subquery不能在query的from clause中使用，那么到底可以用在哪里呢？

正如我们在例子中看到的一样，correlated subquery可以在where clause中使用，同时就像non-correlated subquery一样，它也可以在having clause或select clause中使用。

这些可以被允许出现的clause有什么共同之处呢？答案可以说非常简单：他们被用在数据已经被拉出来之后(after data is pulled)，要么用于限制被选中的rows或者修改哪些columns被显示出来!换句话说，如果你将subquery全部拉出去，你也可以获得完全正确逻辑的结果（仅仅少掉subquery部分的逻辑）

比如，我们看看下面两个clause，一个correlated subquery可能会对他们的query结果有什么影响：

首先，我们看看where clause的情况：

select Name from dogs where age>=5

在没有where clause的情况下，query依然可以正确运行并且返回每一条狗的名字。然而，where clause的作用是仅仅剔除了那些年龄小于5的row

类似地，我们再看看select clause的情况下：

select Name, age*7 as humanage from dogs where age >=5

就算没有age*7, 这个query也能够获取和上面的query相同结果的结果集。然而，在select clause中增加这个expressio则会在最后的结果集中的每一row中增加特定的信息。

上面两个例子有什么共性呢？无论是在where clause的情况还是select expression的情况都是运行于一个除了where clause和select expression(age*7)外的其他query返回的结果上，他们只能limit rows(where)或者增加columns(select).无一例外，在where和select中出现的expression都是针对如果没有他们时的query返回的结果集中的每一行来evaluated的.

所以，现在我们可以问这样一个问题：

如果我们将where comparison或者select expression替换成一个correlated subquery结果会是怎样？答案是相同的共性： subquery执行于将其剔除掉后的query返回的结果集中的每一行。这样，对每一条row record,你可以在outer from clause中引用任何table的任何column的value. 看看下面的corelated subquery:

select Name from dogs d where (select max(HaircutDate) from haircuts h where h.Name = d.Name) < '2008-09-01'

这个query返回每一条自从2008年9月1日未理过发的狗的名字。当这个query被执行时，在where cluase中的subqury对于每一条剔除subquery外的其他query语句结果集中的每一条row都要被执行一遍。也就是说，如果在dogs表中有20个dogs记录，mysql将执行这个subquery 20遍，针对每row每次运行时都将d.Name使用该row的dog name来替换后执行。

再看下面的查询，我们将使用在select clause中的subquery来获取每一条狗的最近harcut date:

select Name, (select max(HaircutDate) from haircuts h where h.Name = d.Name) as LastHaircut from dogs d

和普通的驻留于where和select clause中的subquery一样，corelated subquery被设计为返回一系列标量值（而不是一表格的结果），或者有时为一行的值。 subquery-related keywords比如IN, ANY, SOME, EXISTS仅仅返回true或者false，而这非常适合使用where clause subquery

你也可以在UPDATE, DELETE STATEMENTS中的where clause中使用corelated subquery以便narrow down哪些row将可以被这条statement所影响。

下面的correlated query执行过程：

select distinct a.ProductID, 
       a.UnitPrice as Max_unit_price_sold
from order_details as a
where a.UnitPrice = 
(
    select max(UnitPrice)
    from order_details as b
    where a.ProductID = b.ProductID
)
order by a.ProductID;

The outer query passes a value for ProductID to the subquery. It takes place in the WHERE clause in the subquery [ where a.ProductID = b.ProductID ]
The subquery uses this passed-in ProductID value to look up the max unit price for this product
[ select max(UnitPrice) from order_details ]
When the max unit price for the product is found in the subquery, it's returned to the outer query.

The outer query then uses this max unit price in its WHERE clause to match unit price in order_details table for this product [ where a.UnitPrice = ]

When the row is found, query engine temporarily holds the row in memory. It's guaranteed that a row will be found because both outer query and subquery use the same table - order_details.
The query engine then moves onto next row in the order_details table and repeat Step 1 to 3 again for the next product.
When all products in order_details have been evaluated, it does a sorting and then returns the query result.

mysql查询优化best practice

1. 只获取app需要的row(使用where clause)

2. 只获取app需要的column，避免使用select *

3. 避免多次获取相同的数据，应该使用app的cache机制缓存需要多次使用到的数据

4. 使用db的orderby in the select clause rather than app

5. 将大的delete, update, insert query分解成多个小的query

6. 所有的column使用适当的数据类型， smaller columns总是更快;

7. mysql query cache是case sensitive的

8. 将所有where clause中的column都增加index (具体要看explain命令查看是否用到对应的index)

9. 将join中用到的column都做index

10. table order对于innder join clause是没有关系的;

11. 使用limit clause来实现pagination 逻辑;

mysql查询执行顺序：

SQL JOIN:

SQL JOIN combines columns from two or more tables in a single result set.

inner join, outer join, cross join, self join, natural join

inner join返回rows when there is at least one match in both the tables.

应该避免ambiguity,通过alias table 的方法；

select t1.*, t2.* from table1 t1 inner join table2 t2 on t1.id = t2.id

left outer join returns all the rows from the left table with the matching rows from the right table, if there are no columns matching in the right table, it returns null values

right outer join returns all the rows from the right table with the matching rows from the left table, if there are no columns matching in the left table, it returns null values

full outer join: 由于mysql并不支持full outer join,因此我们必须使用left outer join和right outer join以及union来模拟full outer join

joins vs subquery

joins can include any columns from joining tables in the select clause

joins easy to read and more intuitive;

subquery can pass the aggregate values to the main query

select fm.title, cat.name,dt.countofcategory from film fm

inner join film_category fc on fc.film_id = fm.film_id

inner join category cat on cat.category_id = fc.category_id

inner join(

select count(*) as countofcategory, fc.category_id from film_category fc

group by fc.category_id) dt on dt.category_id = fc.category_id