SQL刷题（简单）

题目

Easy

加粗的题目里面含有follow up question!
斜体的题目是系列套题，可以一起做！
代码整理全部在附件里（文章最后）

Actors and Directors Who Cooperated At Least Three Times
Ads Performance
Article Views I
Average Selling Price
Biggest Single Number
Combine Two Tables
Consecutive Available Seats
Customer Order Frequency
Customer Placing the Largest Number of Orders
Customers Who Never Order
Delete Duplicate Emails
Duplicate Emails
Employee Bonus
Employees Earning More Than Their Managers
Find Customer Referee
Find Users With Valid E-Mails
Find the Team Size
Fix Product Name Format
Friend Requests I: Overall Acceptance Rate
Unique Orders and Customers Per Month
Friendly Movies Streamed Last Month
Game Play Analysis I
Game Play Analysis II
Group Sold Products By The Date
Immediate Food Delivery I
List the Products Ordered in a Period
Not Boring Movies
Project Employees II
Customer Who Visited but Did Not Make Any Transactions
Queries Quality and Percentage
Reformat Department Table
Rising Temperature
Sales Analysis I
Sales Analysis II
Sales Analysis III
Sales Person
Second Highest Salary
Shortest Distance in a Line
Students and Examinations
Swap Salary
Top Travellers
Triangle Judgement
Unique Orders and Customers Per Month
User Activity for the Past 30 Days I
User Activity for the Past 30 Days II
Warehouse Manager
Weather Type in Each Country

前言

这篇文章主要帮助大家开启对 SQL 的了解与学习，总结了 LeetCode 中的 Database 中 SQL（简单）里面代表性较高且值得去反复做的 47 道题，希望能对大家入门 SQL 有所帮助！

Now, Let’s get started!

1. Actors and Directors Who Cooperated At Least Three Times

注意到，这里的table: ActorDirector 包含三列，其中timestamp是主键。

我们需要找到所有的配对（actor_id, director_id）其中演员和导演合作过至少三次。

所以，我们要求actor_id和director_id配对的数量不小于三，
利用count函数我们有：

select actor_id, director_id, count(*) as counts
from actordirector
group by actor_id, director_id

这一步的结果如下，我们对应每个演员、导演并且计算出他们合作的次数，分别为3，2，2。

{"headers": ["actor_id", "director_id", "c"], "values": [[1, 1, 3], [1, 2, 2], [2, 1, 2]]}

如果不清楚group by可以看这里。

还有，我们可以用group by 1,2来代替group by actor_id, director_id，其实1和2代表的就是select前两列的意思。

select actor_id, director_id, count(*) as counts
from actordirector
group by 1,2

接下来，我们只需在这个基础上找到所有的演员和导演满足counts >= 3即可。

最终答案如下：

select actor_id, director_id from
(select actor_id, director_id, count(*) as counts
from actordirector
group by 1,2) tmp
where counts >= 3

这里使用了十分常见的subquery，其中注意要在table后面加上别名（alias），这里用tmp表示

2. Ads Performance

注意，Ads表里面ad_id和user_id是主键，我们需要按照给出的公式算出CTR。

所以根据每个ad_id，我们可以计算出其对应的Clicked和Viewed的个数，在利用公式可求得CTR。

select ad_id,
# case when类似于if else语句，表示当action = 'Clicked'记为1，否则记为0,注意一定后面加上end！用sum函数计算每个不同的ad_id对应的Clicked的总数，同理用其除以后面的式子，得到ctr，并用round函数保留两位小数
	round(100*sum(case when action = 'Clicked' then 1 else 0 end) / sum(case when action = 'Clicked' or action = 'Viewed' then 1 else 0 end),2) as ctr
from ads
group by ad_id # 对应每个不同的ad_id
order by ctr desc, ad_id asc; # 排序

结果如下：

{"headers": ["ad_id", "ctr"], "values": [[1, 66.67], [3, 50.00], [2, 33.33], [5, null]]}

我们发现，当ad_id等于5，ctr为null而不是0.00。所以我们需要对这种情况做些调整，用ifnull函数把null转换成0。

最终答案如下：

# case when类似于if else语句，表示当action = 'Clicked'记为1，否则记为0,注意一定后面加上end！
# 用sum函数计算每个不同的ad_id对应的Clicked的总数，同理用其除以后面的式子，得到ctr，并用round函数保留两位小数，且用ifnull函数处理缺失值。
select ad_id,
	ifnull(round(100*sum(case when action = 'Clicked' then 1 else 0 end) / sum(case when action = 'Clicked' or action = 'Viewed' then 1 else 0 end),2),0) as ctr
from ads
group by ad_id # 对应每个不同的ad_id
order by ctr desc, ad_id asc; # 排序

3. Article Views I

Table: Views

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+

注意这里没有主键，说明可能会含有重复的行，每一行表示某个观看者在某个日期观看了某个作者写的某篇文章。

我们要找出所有的观看了至少一篇自己的文章的作者，并且按照他们的id的升序排序。

Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+---+----+----+---------------+

Result table:
+------+
| id   |
+------+
| 4    |
| 7    |
+------+

正确答案如下：

select distinct author_id as 'id'
from views
where author_id = viewer_id
order by 1

# 注意
# 1. distinct 函数筛选出了唯一的auther_id，因为可能有重复的行存在
# 2. order by 1 = order by author_id

4. Average Selling Price

Table: Prices

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| start_date    | date    |
| end_date      | date    |
| price         | int     |
+---------------+---------+

这里的 product_id、start_date和 end_date 都是主键，
每一行表示了在开始日期到结束日期内的每个 product_id 的价格，
并且对于一个 product_id，不会有两个相交的时间区间存在。

Table: UnitsSold

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| purchase_date | date    |
| units         | int     |
+---------------+---------+

这里没有主键，所以有重复的行，每一行表示对于每个卖出的产品的 product_id、purchase_date 和 units。

我们需要找到每个产品的平均售卖价格。

例如：

Prices table:
+------------+------------+------------+--------+
| product_id | start_date | end_date   | price  |
+------------+------------+------------+--------+
| 1          | 2019-02-17 | 2019-02-28 | 5      |
| 1          | 2019-03-01 | 2019-03-22 | 20     |
| 2          | 2019-02-01 | 2019-02-20 | 15     |
| 2          | 2019-02-21 | 2019-03-31 | 30     |
+------------+------------+------------+--------+
 
UnitsSold table:
+------------+---------------+-------+
| product_id | purchase_date | units |
+------------+---------------+-------+
| 1          | 2019-02-25    | 100   |
| 1          | 2019-03-01    | 15    |
| 2          | 2019-02-10    | 200   |
| 2          | 2019-03-22    | 30    |
+------------+---------------+-------+

Result table:
+------------+---------------+
| product_id | average_price |
+------------+---------------+
| 1          | 6.96          |
| 2          | 16.96         |
+------------+---------------+
Average selling price = Total Price of Product / Number of products sold.
Average selling price for product `1 = ((100 * 5) + (15 * 20)) / 115 = 6.96`
Average selling price for product `2 = ((200 * 15) + (30 * 30)) / 230 = 16.96`

正确答案如下：

select p.product_id, round(sum(price*units)/sum(units),2) as average_price
from prices as p
join unitssold as u
using (product_id)
where u.purchase_date between p.start_date and p.end_date
group by p.product_id

# 注意
# 1. 对于table要加上alias，并且对于公共的product_id要加上对应的table名
# 2. using (product_id) 等价于 on p.product_id = u.product_id
# 3. 要保证purchase_date在start_date和end_date中间，可能会被忽略，并且这个条件要在group by之前，因为在aggregate之前我们要筛选出满足时间范围的行，在进行计算！

5. Biggest Single Number

Table my_numbers contains many numbers in column num including duplicated ones.
Can you write a SQL query to find the biggest number, which only appears once.

+---+
|num|
+---+
| 8 |
| 8 |
| 3 |
| 3 |
| 1 |
| 4 |
| 5 |
| 6 |

For the sample data above, your query should return the following result:

+---+
|num|
+---+
| 6 |

Note:
If there is no such number, just output null.

那么我们要找的就是num里面只出现一次的最大的数

可能你会这么写

select max(num) as 'num'
from my_number
group by num
having count(*) = 1

但这样会返回错误的值，如下：

{"headers": ["num"], "values": [[1], [4], [5], [6]]}

我们是找出了出现次数为1的数字但是max函数貌似不起作用，因为这里的max(num)针对的是num，相当于对每一个group by的数字都取了最大值，也就是本身，所以无效

正确答案如下：

select max(num) as num
from
(select
    num
from
    my_numbers
group by num
having count(num) = 1) as tmp
# 使用subquery对所有出现次数为1的数字取最大值

或者

select max(num) as num
from
(select
    num,count(*) as counts
from
    my_numbers
group by num) as tmp
where counts = 1

6. Combine Two Tables

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| PersonId    | int     |
| FirstName   | varchar |
| LastName    | varchar |
+-------------+---------+

PersonId is the primary key column for this table.

Table: Address

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| AddressId   | int     |
| PersonId    | int     |
| City        | varchar |
| State       | varchar |
+-------------+---------+

AddressId is the primary key column for this table.

Write a SQL query for a report that provides the following information for each person in the Person table, regardless if there is an address for each of those people:

FirstName, LastName, City, State

这个题看似很简单，只需要把他们合并起来就可以了，但是要注意inner join和left join的区别（right join其实和left join是一回事）。

一图流！

这里题目要求我们找出每个人的各种信息，无论这个人是不是有地址，也就是哪怕没有地址，也要用null表示，那自然要用left join。

正确答案如下：

select FirstName, LastName, City, State
from Person left join Address
on Person.PersonId = Address.PersonId
# using (PersonId)

结果如下：

{"headers": ["FirstName", "LastName", "City", "State"], "values": [["Allen", "Wang", null, null]]}

7. Consecutive Available Seats

Several friends at a cinema ticket office would like to reserve consecutive available seats.
Can you help to query all the consecutive available seats order by the seat_id using the following cinema table?

seat_id	free
1	1
2	0
3	1
4	1
5	1

Your query should return the following result for the sample case above.

seat_id
3
4
5

Note:

The seat_id is an auto increment int, and free is bool (‘1’ means free, and ‘0’ means occupied.).
Consecutive available seats are more than 2(inclusive) seats consecutively available.

我们要找到连续挨着的并且是可以坐的（available）的seat_id，并按照其升序排序。

感觉不是那么简单对吧，这里要用到cross join。

一图流！

所谓cross join也就是两个表格的笛卡尔积，是我们很常用的手段之一。那么如果这个table有5行，那么在经过跟自己本身cross join之后，就会变成一个5 x 5的table，我们要找的就是 table A 和 table B（当然，他们其实是同一个table，只不过名称不同）的满足下列两个条件的seat_id：

在 A 和 B 中都满足free = 1，因为我们要满足 available。
seat_id是连续的，也就是他们差的绝对值是等于1的

正确答案如下：

select distinct a.seat_id
from cinema a, cinema b
where abs(a.seat_id - b.seat_id) = 1
and a.free = 1
and b.free = 1
order by 1
# 直接使用 select from table A, table B 这种语法其实就是cross join，不要以为你没有用join，其实这就是和cross join等价的！

或者

select distinct a.seat_id
from cinema a
join cinema b
	on abs(a.seat_id - b.seat_id) = 1
	and a.free = 1 
	and b.free = 1
order by 1
# 这个就是self join的一般写法！

我估计你可能还是没太明白self join的含义，这个题是属于cross join的题，但也属于self join，因为table是和自己本身cross join的，但是cross join不局限于自己和自己本身去 join，下面这个链接会帮你更好的理解self join!

补充资料：

8. Customer Order Frequency

这个题目算是比较复杂的了，我会一步一步拆开讲解！

首先这里有三个 table，分别是 customers、product 和 orders，我们需要从中找出 customer_id 和 name 满足下面两个条件：

在2020年6月和7月都有消费
并且消费的金额都不小于$100

首先，我们从例子中可以看到，product_id 和 customer_id 可以作为连接表的媒介，并且是不存在其中一个媒介的元素一个表格里有，而另一个表格没有，这就说明我们可以放心使用 inner join 而不用担心 null 会存在的问题！

那么自然我们要把 product 和 orders 连接在一起，因为他们包含了产品的价格和数量，我们需要计算他们的乘积，得出某个人在某个时间在某个产品上花费的总钱数。

select 先空着
from product p
join orders o
on p.product_id = o.product_id
# using (product_id)

那么这样我们就把两个 table 连接起来了，我们要选取这个新表格的哪些列呢？

根据我们所需要满足的条件，肯定需要 customer_id，因为我们需要知道是谁买的；也需要 order_date，因为我们只要找在6月和7月有消费的人；最后还有 price * quantity，因为我们要找到消费总额超过$100的人。

所以我们有

select customer_id, order_date price*quantity
from product p
join orders o
on p.product_id = o.product_id

然后，我们发现 order_date 是以年月日的形式出现的，但我们需要把他转换成以月为单位，因为我们只要找6或7月的消费的人，不关心具体是哪一天，这里使用了 date_format ，可以对日期进行转换。并且，我们需要针对每个消费者，计算出他们在6月和7月的总消费，所以用 sum 函数和 group by 函数；在这个基础上，我们要限定 order_date 在6到7月内，使用 between and，最后，要保证每人月均总消费不小于$100。

select customer_id, 
	   date_format(order_date, '%Y-%m') as 'order_date', 
	   sum(price*quantity) as total
from product p
join orders o
on p.product_id = o.product_id
where order_date between '2020-06-01' and '2020-07-31'
group by 2,1 # 也就是group by order_date和customer_id
having total >= 100

结果如下：

{"headers": ["customer_id", "order_date", "total"], "values": [[2, "2020-06", 600], [1, "2020-06", 300], [3, "2020-06", 110], [1, "2020-07", 100]]}

那么到这里，这道题的雏形已经出来了！我们已经成功找到了6和7月的月均总消费不小于$100的所有人，但是我们要找的人一定要在6和7月都有消费，也就是对于每个消费者，他们在其中出现的次数要等于2！

所以上面的结果成为了我们即将要去搜索的新表格，接下来使用subquery，对其进行进一步搜索。

正确答案如下：

select customer_id, name
from customers 
where customer_id in # 从customer表里面找到满足下面括号内容的customer_id
(
	select customer_id 
	from
		(select customer_id, 
				date_format(order_date, '%Y-%m') as 'order_date', 
				sum(price*quantity) as total
		from product p
		join orders o
		on p.product_id = o.product_id
		where order_date between '2020-06-01' and '2020-07-31'
		group by 2,1
		having total >= 100) tmp # tmp就是我们之前建立的新表格
	group by customer_id
	having count(customer_id) = 2 
	# 从新表格里面选出在其中出现次数等于2的消费者
)

看着是不是很复杂，其实并没有，真正工作当中，应用当中比这个复杂的要多的多。所以也不要感觉SQL特别简单，一学就会，其实也不尽然（这道题被归在简单的题集里）。还有我要建议你要把代码写的稍微“好看”一点，方便别人读懂，如果像下面这么写，估计别人很难看懂你要干什么吧，哪怕这仅仅是20行不到的代码。

select customer_id from (select customer_id, date_format(order_date, '%Y-%m') as 'order_date', sum(price*quantity) as total
from product p join orders o on p.product_id = o.product_id
where order_date between '2020-06-01' and '2020-07-31' group by 2,1
having total >= 100) tmp group by customer_id having count(customer_id) = 2)

看了这个，别人可能会想说。。。

9. Customer Placing the Largest Number of Orders

Query the customer_number from the orders table for the customer who has placed the largest number of orders.

It is guaranteed that exactly one customer will have placed more orders than any other customer.

The orders table is defined as follows:

Column	Type
order_number (PK)	int
customer_number	int
order_date	date
required_date	date
shipped_date	date
status	char(15)
comment	char(200)

Sample Input

order_number	customer_number	order_date	required_date	shipped_date	status
1	1	2017-04-09	2017-04-13	2017-04-12	Closed
2	2	2017-04-15	2017-04-20	2017-04-18	Closed
3	3	2017-04-16	2017-04-25	2017-04-20	Closed
4	3	2017-04-18	2017-04-28	2017-04-25	Closed

Sample Output

customer_number
3

Explanation

The customer with number ‘3’ has two orders, which is greater than either customer ‘1’ or ‘2’ because each of them only has one order.
So the result is customer_number ‘3’.

要注意，我们保证了只有一个消费者会比其他消费者下更多的单，也就是说我们不必担心会有多个消费者下了同样多的单，并且是最多的。

所以我们可以使用limit函数和order by函数对每个客户下单次数进行降序排序，并且取第一个，也就是最大的那个。

正确答案如下：

select customer_number
from orders
group by 1  
order by count(order_number) desc
# 针对每个消费者查找其下单次数，并降序排列
limit 1 # 取第一个

Follow up: What if more than one customer have the largest number of orders, can you find all the customer_number in this case?

如果面试官抛给了你这样的问题，你还会吗？

首先，降序排列找第一个这种策略已经不适用了，因为有可能出现好几个消费者都下了一样多的单，并且都是最多的，我们也不知道取前几个，这时会用到我们很常用的window function来辅助我们进行排序。

先来说一下window function的种类，我想下面这个图可以让你理解这三种排序函数的区别。

row_number返回从1开始的序列
rank返回当前结果中每一个记录的排名
dense_rank和rank很像，但是没有gap

那么我们先对每个消费者的下单数进行降序排序

select customer_number, 
	   rank() over (order by count(*) desc) as rnk
	   # 这里我们使用rank()函数，其实在查找排名第一的情况时，
	   # dense_rank()和rank()其实没区别的
from orders

先重新定义一下我们的输入：

{"headers":{"orders":["order_number","customer_number"]},"rows":{"orders":[[1,1],[2,2],[3,3],[4,3],[5,4],[6,4]]}}

可以看出，编号为3和4的消费者都下单了两次，并且是最多的。

根据上面的代码，我们得到结果如下：

{"headers": ["customer_number", "rnk"], "values": [[3, 1], [4, 1], [1, 3], [2, 3]]}

意思是customer_id为3和4的消费者下单量排名第一，那么其他的我们就不考虑了，因为我们只考虑排名第一的。

那么在这个基础上，我们要再从这个新表格里面筛选customer_id满足rnk = 1即可，使用subquery结果如下：

select customer_number 
from
(
	select customer_number, 
		   rank() over (order by count(*) desc) as rnk
	from orders
	group by customer_number
) tmp
where rnk = 1 # 这里要记得用alias哦!

那么，我现在有一个很欠揍的问题，如果我不想找排名第一的了，我想找排名第二的，而且我也不保证会出现排名第二的消费者下单数都相同的情况哦！

那么，拯救世界的任务就拜托你了！

10. Customers Who Never Order

咳咳，那么来点简单的划划水吧！

这个题非常的straightforward，就是让我们找所有一毛不拔的顾客，什么都不买的人。

有人说：“见到两个table我就手痒，就想join怎么办？”

join归join，您先想清楚怎么join好伐，之前说过了 inner join 和 left join 的区别。那么既然我们要找没买东西的人，那就是要找两个table连到一起后，orders表格中的Id为null的人嘛！

正确答案如下：

select Name as 'Customers'
from customers as 'c'
left join orders as 'o'
on c.Id = o.CustomerId
where o.Id is null

但是毕竟我们用了join，在跑大型数据集时时间还是会有些长，那么我又变成了坏人，能不能不用join呢？

那么便是筛选Customers表格中的不在Orders表格中的CustomerId中的Id呗。

Emmmmm我想你应该明白我的意思…

select name as 'Customers'
from customers
where id not in # not in 作为筛选条件也真的很常见了！
(
    select customerid from orders
);

可以看到在不用join的情况下，速度还是快很多的！

11. Delete Duplicate Emails

我们要删除所有重复的邮件，并按照只保留Id的结果（如果重复的话）

首先，我们使用self join来满足下列两个条件：

Email相同
找到重复并且Id大的

select p1.*
FROM Person p1,
     Person p2
WHERE
    p1.Email = p2.Email 
    # Email相同
    AND 
    p1.Id > p2.Id
    # 这样保证我们找到了所有重复的email并且找出所有大于最小Id的结果
;

上面的结果就是我们要删除的

最终答案如下：

delete p1.*
FROM Person p1,
     Person p2
WHERE
    p1.Email = p2.Email 
    AND 
    p1.Id > p2.Id
;

12. Duplicate Emails

这个题是不是很简单，就是要删除重复的邮件，直接上答案了，不会做的要加油啦！

正确答案如下：

select Email 
from Person 
group by Email  
# 这个Email是Person表的Email这一列
having count(Email) > 1;

13. Customers Who Never Order

这个题比较简单，依旧考察大家对 join 和缺失值的运用，首先使用left join连接两表，在使用where设定题中所给的条件即可。

正确答案如下：

select name, bonus
from Employee 
left join Bonus
using (empId)
where bonus < 1000 or bonus is null 
# 要注意缺失值也要体现出来，如果没有写
# bonus is null的话，是不会显示null的行的

14. Employees Earning More Than Their Managers

The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.

+----+-------+--------+-----------+
| Id | Name  | Salary | ManagerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
+----+-------+--------+-----------+

Given the Employee table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.

+----------+
| Employee |
+----------+
| Joe      |
+----------+

这个题算是比较经典的self join面试题了，目的是要找所有比自己经理工资更多的雇员。

我们看到了不是每个雇员都对应着经理，要注意这一点！

接下来我们用self join把表格自连接

# cross join相当于求两个table的笛卡尔积
select 
    a.Name as Employee
from 
    Employee a, Employee b

前四列来自表格a，后四列来自表格b。

那么接下来就要筛选满足下面两个条件的行：

a.ManagerId = b.Id
a.Salary > b.Salary

你可能有疑问了，为啥不是a.Id = b.ManagerId呢？那不差不多么，其实差的十万八千里了，self join 是有顺序的就跟left join和right join一样，本质是一回事，但是哪么表格在前哪个表格在后，决定了连接方式。

首先，我们要取的Name这一列是来自于表格 a 的，那么这种情况下，表格 a self join 表格 b 说白了就是先取出表格 a 的第一行，然后跟表格 b 的每一行做比对，对这个Employee表格来说就有了四行，再取出表格 a 的第二行，再跟表格 b 的每一行做比对，又有四行，依次类推，总共就是 4 x 4 = 16 行。

所以，以这种比对方式，我们自然要筛选的是a.ManagerId = b.Id，ManagerId的缺失值可以因此被排除掉，我们只筛选a.ManagerId = b.Id且a.ManagerId存在的情况，反之，则没排除a.ManagerId is null的情况。

其实，如果我们要取的Name这一列是来自于表格 b 的，那么一切就都反过来了，条件自然是a.Id = b.ManagerId。

正确答案如下：

select 
    a.Name as Employee
from 
    Employee a, Employee b
where a.ManagerId = b.Id
and a.Salary > b.Salary

或者

select
    a.Name as Employee
from
    Employee as a
JOIN 
	Employee as b
ON a.ManagerId = b.Id
where a.Salary > b.Salary

15. Find Customer Referee

Given a table customer holding customers information and the referee.

+------+------+-----------+
| id   | name | referee_id|
+------+------+-----------+
|    1 | Will |      NULL |
|    2 | Jane |      NULL |
|    3 | Alex |         2 |
|    4 | Bill |      NULL |
|    5 | Zack |         1 |
|    6 | Mark |         2 |
+------+------+-----------+

Write a query to return the list of customers NOT referred by the person with id ‘2’.

For the sample data above, the result is:

+------+
| name |
+------+
| Will |
| Jane |
| Bill |
| Zack |
+------+

我们要找到所有没有被id = 2 refer的消费者。

那么所筛选的消费者要满足下列两个条件其中之一：

referee_id != 2
referee_id is null

正确答案如下：

select name
	from customer
where 
	referee_id != 2 
	or 
	referee_id is NULL

16. Find Users With Valid E-Mails

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
| mail          | varchar |
+---------------+---------+

user_id is the primary key for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.

Write an SQL query to find the users who have valid emails.

A valid e-mail has a prefix name and a domain where:

The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_', period '.' and/or dash '-'. The prefix name must start with a letter.
The domain is '@leetcode.com'.

Return the result table in any order.

The query result format is in the following example.

# Users
+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | winston@leetcode.com    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |
| 5       | Marwan    | quarz#2020@leetcode.com |
| 6       | David     | david69@gmail.com       |
| 7       | Shapiro   | .shapo@leetcode.com     |
+---------+-----------+-------------------------+

# Result table
+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | winston@leetcode.com    |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |
+---------+-----------+-------------------------+

# The mail of user 2 doesn't have a domain.
# The mail of user 5 has # sign which is not allowed.
# The mail of user 6 doesn't have leetcode domain.
# The mail of user 7 starts with a period.

这里我们使用正则表达式来寻找特殊字符和结构来进行字符串匹配

正确答案如下：

SELECT *
	FROM Users
WHERE 
	mail REGEXP '^[A-Za-z][A-Za-z0-9_.-]*@leetcode.com$'

^[A-Za-z]代表以 A 到 Z 或者 a 到 z 的字母开头的字符串
*表示其前面必须有内容
$表示以某字符串结尾

所以最后 '^[A-Za-z][A-Za-z0-9_.-]*@leetcode.com$' 表示

@leetcode.com作为结尾
@leetcode.com前面必须有内容
@leetcode.com前面的内容可以以任意大小写字母开头，里面可以包含任意大小写字母、0-9任何数字以及 _、. 、-。

关于正则表达式在sql的使用，请看

17. Find the Team Size

Table: Employee

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| employee_id   | int     |
| team_id       | int     |
+---------------+---------+

employee_id is the primary key for this table.
Each row of this table contains the ID of each employee and their respective team.

Write an SQL query to find the team size of each of the employees.

Return result table in any order.

The query result format is in the following example:

# Employee Table:
+-------------+------------+
| employee_id | team_id    |
+-------------+------------+
|     1       |     8      |
|     2       |     8      |
|     3       |     8      |
|     4       |     7      |
|     5       |     9      |
|     6       |     9      |
+-------------+------------+
# Result table:
+-------------+------------+
| employee_id | team_size  |
+-------------+------------+
|     1       |     3      |
|     2       |     3      |
|     3       |     3      |
|     4       |     1      |
|     5       |     2      |
|     6       |     2      |
+-------------+------------+
# Employees with Id 1,2,3 are part of a team with team_id = 8.
# Employees with Id 4 is part of a team with team_id = 7.
# Employees with Id 5,6 are part of a team with team_id = 9.

我们要找到每个人对应的队伍里含有的总人数是多少，那么首先要根据每个team_id计算出各自队伍的人数是多少

select team_id, count(*) as c
from Employee
group by team_id

结果如下：

{"headers": ["team_id", "c"], "values": [[8, 3], [7, 1], [9, 2]]}

意味着我们得到了每个team_id对应的人数，那么我们只需要把这个结果和Employee表格再连接一下即可，就可以把employee_id和c（c作为count(*)的别名）联系起来。

正确答案如下：

select 
	employee_id, c as team_size
from 
	Employee
join
(
	select 
		team_id, count(*) as c
	from 
		Employee
	group by 
		team_id
) tmp
using (team_id)

18. Fix Product Name Format

Table: Sales

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| sale_id      | int     |
| product_name | varchar |
| sale_date    | date    |
+--------------+---------+

sale_id is the primary key for this table.

Each row of this table contains the product name and the date it was sold.

Since table Sales was filled manually in the year 2000, product_name may contain leading and/or trailing white spaces, also they are case-insensitive.

Write an SQL query to report

product_name in lowercase without leading or trailing white spaces.
sale_date in the format ('YYYY-MM')
total the number of times the product was sold in this month.

Return the result table ordered by product_name in ascending order, in case of a tie order it by sale_date in ascending order.

The query result format is in the following example.

# Sales
+------------+------------------+--------------+
| sale_id    | product_name     | sale_date    |
+------------+------------------+--------------+
| 1          |      LCPHONE     | 2000-01-16   |
| 2          |    LCPhone       | 2000-01-17   |
| 3          |     LcPhOnE      | 2000-02-18   |
| 4          |      LCKeyCHAiN  | 2000-02-19   |
| 5          |   LCKeyChain     | 2000-02-28   |
| 6          | Matryoshka       | 2000-03-31   | 
+------------+------------------+--------------+

# Result table:
+--------------+--------------+----------+
| product_name | sale_date    | total    |
+--------------+--------------+----------+
| lcphone      | 2000-01      | 2        |
| lckeychain   | 2000-02      | 2        | 
| lcphone      | 2000-02      | 1        | 
| matryoshka   | 2000-03      | 1        | 
+--------------+--------------+----------+

In January, 2 LcPhones were sold, please note that the product names are not case sensitive and may contain spaces.
In Februery, 2 LCKeychains and 1 LCPhone were sold.
In March, 1 matryoshka was sold.

这个题看似很复杂，其实我们只需要对基本的列进行转化再使用简单的group by就能得到答案。

我们筛选的结果要满足：

product_name是小写的，并且前面后面都没有空格
sale_date以('YYYY-MM')出现
total表示每个产品在这个月的销量

这里我们用了lower函数使product_name小写，并用trim函数去掉字符串前面后面的空格；并且使用date_format函数对日期进行转化，之前讲过了不再赘述。

正确答案如下：

select 
	lower(trim(product_name)) as 'product_name',
	date_format(sale_date, '%Y-%m') as 'sale_date',
	count(*) as 'total'
from
	sales
group by 1,2 
order by 1,2;
# 这里的1和2指的是select里的第一列第二列 
# 不能用alias代替

19. Friend Requests I: Overall Acceptance Rate

In social network like Facebook or Twitter, people send friend requests and accept others’ requests as well. Now given two tables as below:

Table: friend_request

sender_id	send_to_id	request_date
1	2	2016-06-01
1	3	2016-06-01
1	4	2016-06-01
2	3	2016-06-02
3	4	2016-06-09

Table: request_accepted

requester_id	accepter_id	accept_date
1	2	2016-06-03
1	3	2016-06-08
2	3	2016-06-08
3	4	2016-06-09
3	4	2016-06-10

Write a query to find the overall acceptance rate of requests rounded to 2 decimals, which is the number of acceptance divide the number of requests.

For the sample data above, your query should return the following result.

accept_rate
0.80

Note:

The accepted requests are not necessarily from the table friend_request. In this case, you just need to simply count the total accepted requests (no matter whether they are in the original requests), and divide it by the number of requests to get the acceptance rate.
It is possible that a sender sends multiple requests to the same receiver, and a request could be accepted more than once. In this case, the ‘duplicated’ requests or acceptances are only counted once.
If there is no requests at all, you should return 0.00 as the accept_rate.

Explanation: There are 4 unique accepted requests, and there are 5 requests in total. So the rate is 0.80.

Follow-up:

Can you write a query to return the accept rate but for every month?
How about the cumulative accept rate for every day?

这个题很有意思，我会细讲这一题，这题考察我们以下几个点：

对于一个人 A 发送给了另一个人 B 好多条好友申请，但是 B 只能接受一次，我们该咋办？
如果压根没有好友申请，我们要返回 0 而不是 null。
接受率保留两位小数。

题中这种情况，我们发现有4条唯一被接受的申请，而总共5条申请，所以接受率为0.80。

所以，我们要用count和distinct函数分别数两个表中唯一存在的(sender_id,send_to_id)和(requester_id,accepter_id)的出现次数，并用后者除以前者即可，并且使用ifnull函数处理缺失值，round函数保留小数点后两位。

正确答案如下：

select 
	ifnull(round(b.c / a.c,2),0) as accept_rate
from 
    (select 
	    count(distinct sender_id, send_to_id) as c 
	 from 
		 friend_request) a,
    (select 
	    count(distinct requester_id, accepter_id) as c 
	 from 
		 request_accepted) b;
# a和b作为两个table存在，分别数两个表中唯一存在的(sender_id,send_to_id)和(requester_id,accepter_id)的出现次数

这种写法能清晰的体现出表格 a 和表格 b 都分别代表什么，比较推荐！

也有以下的写法：

with 
a as 
    (select count(distinct requester_id, accepter_id) as accepts
    from request_accepted),	
b as 
	(select count(distinct sender_id, send_to_id) as total 
	from friend_request)

select round(ifnull(accepts/total,0),2) as accept_rate from a, b;
# a和b作为两个table存在，分别数两个表中唯一存在的(sender_id,send_to_id)和(requester_id,accepter_id)的出现次数

标准答案

select
round(
    ifnull(
    (select count(*) from (select distinct requester_id, accepter_id from request_accepted) as A)
    /
    (select count(*) from (select distinct sender_id, send_to_id from friend_request) as B),
    0)
, 2) as accept_rate;

那么面试官看你做到这里微微一笑，你此时意识到事情并没这么简单，第一个 follow up 来了！

Follow up Question 1

你能找出每个月对应的accept_rate吗？

你想了一想那不就group by一下就完事了吗？

在刚刚的代码基础上：

select 
	ifnull(round(b.c / a.c,2),0) as accept_rate, a.month
from 
    (select 
	    count(distinct sender_id, send_to_id) as c, 
	    month(accept_date) as 'month'  # month函数可以将日期转换成月份
	 from 
		 friend_request
	 group by 2) a,   # 这里group by是按每个月来数次数
    (select 
	    count(distinct requester_id, accepter_id) as c, month(accept_date)
	 from 
		 request_accepted
	 group by 2) b;
where a.month = b.month # 这里默认了send和request在同一个月发生，否则分母为0

但是可能出现假如只有一个人 A 给 B 在5月份发了邀请，但是 B 在 6 月份接受了邀请，那么这两个月的接受率都应该为 0；前者因为在5月没有人接受，后者因为6月没人发邀请。

那么我们改进后：

with date_table 
as (SELECT month(request_date) AS 'month' FROM friend_request
UNION
SELECT month(accept_date) FROM request_accepted
ORDER BY month) # union 函数去除了重复值，return两列合并成一列的所有不重复月份

select ifnull(accept_rate,0), month from
date_table 
left join # left join为了找到上述accept_rate可能为0的情况
(select 
	ifnull(round(b.c / a.c,2),0) as accept_rate, a.month
from 
    (select 
	    count(distinct sender_id, send_to_id) as c, 
	    month(accept_date) as 'month'  # month函数可以将日期转换成月份
	 from 
		 friend_request
	 group by 2) a,   # 这里group by是按每个月来数次数
    (select 
	    count(distinct requester_id, accepter_id) as c, month(accept_date)
	 from 
		 request_accepted
	 group by 2) b;
where a.month = b.month) tmp # 这里默认了send和request在同一个月发生，否则分母为0
using (month)

我们用以下数据实验一下：

{"headers":{"friend_request":["sender_id","send_to_id","request_date"],"request_accepted":["requester_id","accepter_id","accept_date"]},"rows":{"friend_request":[[1,2,"2016/05/01"],[1,3,"2016/06/01"],[1,4,"2016/06/01"],[2,3,"2016/07/02"],[3,4,"2016/06/09"]],"request_accepted":[[1,2,"2016/08/03"],[1,3,"2016/06/08"],[2,3,"2016/07/08"],[3,4,"2016/06/09"],[3,4,"2016/06/10"]]}}

Table: friend_request

sender_id	send_to_id	request_date
1	2	2016-05-01
1	3	2016-06-01
1	4	2016-06-01
2	3	2016-07-02
3	4	2016-06-09

Table: request_accepted

requester_id	accepter_id	accept_date
1	2	2016-08-03
1	3	2016-06-08
2	3	2016-07-08
3	4	2016-06-09
3	4	2016-06-10

红色部分是跟原测试数据不一样，做了改动的，那么我们期待的是：

五月：accept_rate = 0.00，因为5月只有人发送邀请但没人接受
六月：accept_rate = 0.67，因为有3个人发邀请，2个人接受
七月：accept_rate = 1.00，因为有1个人发邀请，1个人接受
八月：accept_rate = 0.00，因为8月虽然有1个人接受，但是没有人发邀请

{"headers": ["ifnull(accept_rate,0)", "month"], "values": [[0.00, 5], [0.67, 6], [1.00, 7], [0.00, 8]]}

奈斯，结果正是我们预想的！

你以为结束了，这时面试官又微微一笑，又问了第二个问题。

Follow up Question 2

那么你能算出每一天的累计accept_rate吗？

区别在于我们要计算的是累计值

with date_table as 
(SELECT request_date AS dates FROM friend_request
UNION
SELECT accept_date FROM request_accepted
ORDER BY dates),

tmp as 
(SELECT 
	coalesce(ROUND(COUNT(DISTINCT requester_id, accepter_id) / COUNT(DISTINCT sender_id, send_to_id), 2),0) as rate, 
	date_table.dates
FROM 
	request_accepted b, friend_request a, date_table
where b.accept_date <= date_table.dates
and a.request_date <= date_table.dates
GROUP BY 2) 

select ifnull(rate,0) as 'rates', dates
from date_table
left join tmp
using (dates)

这里要注意不能直接把两个表格直接group by数次数，因为我们要考虑当前时间下，我们的累计接受率，所以要加上条件

where b.accept_date <= date_table.dates
and a.request_date <= date_table.dates

在这样筛选所需的信息之后再根据dates来group by。

最后改过后的数据试验一下：

Table: friend_request

sender_id	send_to_id	request_date
1	2	2016-05-01
1	3	2016-06-01
1	4	2016-06-01
2	3	2016-07-02
3	4	2016-06-09

Table: request_accepted

requester_id	accepter_id	accept_date
1	2	2016-08-03
1	3	2016-06-08
2	3	2016-07-08
3	4	2016-06-09
3	4	2016-06-10

2016-05-01：accept_rate = 0.00，因为2016-05-01只有1人发送邀请但没人接受
2016-06-01：accept_rate = 0.00，虽然累计有3个人发邀请，但没人接受
2016-06-08：accept_rate = 0.33，因为有3个人发邀请，1个人接受
2016-06-09：accept_rate = 0.50，因为有4个人发邀请，2个人接受
2016-06-10：accept_rate = 0.50，因为有4个人发邀请，2个人接受（2016-06-09和2016-06-10重复接受了）
2016-07-02：accept_rate = 0.40，因为有5个人发邀请，2个人接受
2016-07-08：accept_rate = 0.60，因为有5个人发邀请，3个人接受
2016-08-03：accept_rate = 0.80，因为有5个人发邀请，4个人接受

{"headers": ["rates", "dates"], "values": [[0.00, "2016-05-01"], [0.00, "2016-06-01"], [0.33, "2016-06-08"], [0.50, "2016-06-09"], [0.50, "2016-06-10"], [0.40, "2016-07-02"], [0.60, "2016-07-08"], [0.80, "2016-08-03"]]}

完美！

20. Unique Orders and Customers Per Month

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| customer_id   | int     |
| invoice       | int     |
+---------------+---------+
# order_id is the primary key for this table.
# This table contains information about the orders made by customer_id.

Write an SQL query to find the number of unique orders and the number of unique customers with invoices > $20 for each different month.

Return the result table sorted in any order.

The query result format is in the following example:

# orders
+----------+------------+-------------+------------+
| order_id | order_date | customer_id | invoice    |
+----------+------------+-------------+------------+
| 1        | 2020-09-15 | 1           | 30         |
| 2        | 2020-09-17 | 2           | 90         |
| 3        | 2020-10-06 | 3           | 20         |
| 4        | 2020-10-20 | 3           | 21         |
| 5        | 2020-11-10 | 1           | 10         |
| 6        | 2020-11-21 | 2           | 15         |
| 7        | 2020-12-01 | 4           | 55         |
| 8        | 2020-12-03 | 4           | 77         |
| 9        | 2021-01-07 | 3           | 31         |
| 10       | 2021-01-15 | 2           | 20         |
+----------+------------+-------------+------------+
# result
+---------+-------------+----------------+
| month   | order_count | customer_count |
+---------+-------------+----------------+
| 2020-09 | 2           | 2              |
| 2020-10 | 1           | 1              |
| 2020-12 | 2           | 1              |
| 2021-01 | 1           | 1              |
+---------+-------------+----------------+

Explanation:

In September 2020 we have two orders from 2 different customers with invoices > $20.

In October 2020 we have two orders from 1 customer, and only one of the two orders has invoice > $20.

In November 2020 we have two orders from 2 different customers but invoices < $20, so we don’t include that month.

In December 2020 we have two orders from 1 customer both with invoices > $20.

In January 2021 we have two orders from 2 different customers, but only one of them with invoice > $20.

这个题是个新题，虽然难度不高，但是对基本的语法和函数都起到了很好的练习作用，在简单类型题里面算是比较全面啦！

正确答案如下：

select 
    date_format(order_date, '%Y-%m') as 'month',
    count(distinct order_id) as 'order_count',
    count(distinct customer_id) as 'customer_count'
from orders
where invoice > 20
group by 1

21. Friendly Movies Streamed Last Month

Table: TVProgram

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| program_date  | date    |
| content_id    | int     |
| channel       | varchar |
+---------------+---------+

(program_date, content_id) is the primary key for this table.

This table contains information of the programs on the TV.

content_id is the id of the program in some channel on the TV.

Table: Content

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| content_id       | varchar |
| title            | varchar |
| Kids_content     | enum    |
| content_type     | varchar |
+------------------+---------+

content_id is the primary key for this table.

Kids_content is an enum that takes one of the values (‘Y’, ‘N’) where:

‘Y’ means is content for kids
‘N’ is not content for kids.

content_type is the category of the content as movies, series, etc.

Write an SQL query to report the distinct titles of the kid-friendly movies streamed in June 2020.

Return the result table in any order.

The query result format is in the following example.

TVProgram
+--------------------+--------------+-------------+
| program_date       | content_id   | channel     |
+--------------------+--------------+-------------+
| 2020-06-10 08:00   | 1            | LC-Channel  |
| 2020-05-11 12:00   | 2            | LC-Channel  |
| 2020-05-12 12:00   | 3            | LC-Channel  |
| 2020-05-13 14:00   | 4            | Disney Ch   |
| 2020-06-18 14:00   | 4            | Disney Ch   |
| 2020-07-15 16:00   | 5            | Disney Ch   |
+--------------------+--------------+-------------+
Content
+------------+----------------+---------------+---------------+
| content_id | title          | Kids_content  | content_type  |
+------------+----------------+---------------+---------------+
| 1          | Leetcode Movie | N             | Movies        |
| 2          | Alg. for Kids  | Y             | Series        |
| 3          | Database Sols  | N             | Series        |
| 4          | Aladdin        | Y             | Movies        |
| 5          | Cinderella     | Y             | Movies        |
+------------+----------------+---------------+---------------+
Result
+--------------+
| title        |
+--------------+
| Aladdin      |
+--------------+

Explaination：

“Leetcode Movie” is not a content for kids.

“Alg. for Kids” is not a movie.

“Database Sols” is not a movie.

“Alladin” is a movie, content for kids and was streamed in June 2020.

“Cinderella” was not streamed in June 2020.

这个题比较常规，考察的是对表格连接以及筛选条件的使用，需要注意的是，我们可以在不select某一列的情况下，把其用在where里面作为筛选条件。

正确答案如下：

select distinct title
from TVProgram a
join Content b
on a.content_id = b.content_id
and content_type = 'Movies'
and Kids_content = 'Y'
and program_date like '2020-06%'; # like函数在字符串查找有很多的应用
# and program_date between '2020-06-01' and '2020-06-30'
# and date_format(program_date, '%Y-%m') = '2020-06'

当然，写法不唯一，我们还有：

select distinct title
from TVProgram 
join Content 
using (content_id)
where content_type = 'Movies'
and Kids_content = 'Y'
and program_date like '2020-06%';

22. Game Play Analysis I

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+

(player_id, event_date)is the primary key of this table.

This table shows the activity of players of some game.

Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

Write an SQL query that reports the first login date for each player.

The query result format is in the following example:

# Activity table
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

# Result table
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1         | 2016-03-01  |
| 2         | 2017-06-25  |
| 3         | 2016-03-02  |
+-----------+-------------+

我们要找每个人的第一次登录日期，那么自然而然我们想到用 window function，rank或者dense_rank函数都是可以的，我们使用rank函数，这里注意我们用了partition by这一语句（有点像group by），之前我们仅仅用了order by这一语句，稍微说一下他们的作用：

First, the PARTITION BY clause divides the rows of the result set partitions to which the function is applied.
Second, the ORDER BY clause specifies the logical sort order of the rows in each a partition to which the function is applied.

强调一点，window function不能作为筛选条件直接使用到where等环境中，例如下面这种：

select 
	player_id, 
	event_date as first_login 
from
	activity
where rank() over (partition by player_id order by event_date desc) = 1

这种写法会返回：

You cannot use the window function ‘rank’ in this context.

所以我们可以用 subquery 对把 rank 函数所形成的这一列所对应的别名（alias）进行筛选。

正确答案如下：

select 
	player_id, 
	event_date as first_login 
from
(select 
	player_id, 
	event_date, 
	rank() over (partition by player_id order by event_date) as rnk
from activity) tmp
where rnk = 1

23. Game Play Analysis II

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+

(player_id, event_date) is the primary key of this table.

This table shows the activity of players of some game.

Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

Write a SQL query that reports the device that is first logged in for each player.

The query result format is in the following example:

# Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

# Result table:

+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 1         |
+-----------+-----------+

这个题和上一题基本是一样的思路，都是使用了 window function，所以直接上答案了！

正确答案如下：

select 
	player_id, device_id 
from 
(select 
	player_id, 
	device_id, 
	rank() over (partition by player_id order by event_date asc) as rnk
from activity) t
where rnk = 1

24. Group Sold Products By The Date

Table Activities:

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| sell_date   | date    |
| product     | varchar |
+-------------+---------+

There is no primary key for this table, it may contains duplicates.

Each row of this table contains the product name and the date it was sold in a market.

Write an SQL query to find for each date, the number of distinct products sold and their names.

The sold-products names for each date should be sorted lexicographically.

Return the result table ordered by sell_date.

The query result format is in the following example.

Activities table:
+------------+-------------+
| sell_date  | product     |
+------------+-------------+
| 2020-05-30 | Headphone   |
| 2020-06-01 | Pencil      |
| 2020-06-02 | Mask        |
| 2020-05-30 | Basketball  |
| 2020-06-01 | Bible       |
| 2020-06-02 | Mask        |
| 2020-05-30 | T-Shirt     |
+------------+-------------+

Result table:
+------------+----------+------------------------------+
| sell_date  | num_sold | products                     |
+------------+----------+------------------------------+
| 2020-05-30 | 3        | Basketball,Headphone,T-shirt |
| 2020-06-01 | 2        | Bible,Pencil                 |
| 2020-06-02 | 1        | Mask                         |
+------------+----------+------------------------------+

For 2020-05-30, Sold items were (Headphone, Basketball, T-shirt), we sort them lexicographically and separate them by comma.

For 2020-06-01, Sold items were (Pencil, Bible), we sort them lexicographically and separate them by comma.

For 2020-06-02, Sold item is (Mask), we just return it.

这里我要向大家介绍一个新的函数，就是 group_concat。这个函数是一个 aggregation 函数（聚合函数），是可以配合 group by 一起使用的，我们在这道题里不光要使用 count 函数，也要使用 group_concat 来将组中的字符串连接成单个字符串。

语法如下：

GROUP_CONCAT( 
	DISTINCT expression 
	ORDER BY expression 
	SEPARATOR sep );

Explanation:

The DISTINCT clause allows you to eliminate duplicate values in the group before concatenating them.

The ORDER BY clause allows you to sort the values in ascending or descending order before concatenating. By default, it sorts the values in ascending order. If you want to sort the values in the descending order, you need to specify explicitly the DESC option.

The SEPARATOR specifies a literal value inserted between values in the group. If you do not specify a separator, the GROUP_CONCAT function uses a comma (,) as the default separator.

The GROUP_CONCAT function ignores NULL values. It returns NULL if there was no matching row found or all arguments are NULL values.

The GROUP_CONCAT function returns a binary or non-binary string, which depends on the arguments. by default, the maximum length of the return string is 1024. In case you need more than this, you can extend the maximum length by setting the group_concat_max_len system variable at SESSION or GLOBAL level.

那么这道题我们只需要使用 count 和 group_concat 在加上 group by就可以解决了！

正确答案如下：

select 
	sell_date, 
	count(distinct product) as num_sold,
    group_concat(distinct product order by product) as products
    # group_concat取唯一的product，并且按product排序
from activities 
group by sell_date

25. Immediate Food Delivery I

Table: Delivery

+-----------------------------+---------+
| Column Name                 | Type    |
+-----------------------------+---------+
| delivery_id                 | int     |
| customer_id                 | int     |
| order_date                  | date    |
| customer_pref_delivery_date | date    |
+-----------------------------+---------+

delivery_id is the primary key of this table.

The table holds information about food delivery to customers that make orders at some date and specify a preferred delivery date (on the same order date or after it).

If the preferred delivery date of the customer is the same as the order date then the order is called immediate otherwise it’s called scheduled.

Write an SQL query to find the percentage of immediate orders in the table, rounded to 2 decimal places.

The query result format is in the following example:

# Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1           | 1           | 2019-08-01 | 2019-08-02                  |
| 2           | 5           | 2019-08-02 | 2019-08-02                  |
| 3           | 1           | 2019-08-11 | 2019-08-11                  |
| 4           | 3           | 2019-08-24 | 2019-08-26                  |
| 5           | 4           | 2019-08-21 | 2019-08-22                  |
| 6           | 2           | 2019-08-11 | 2019-08-13                  |
+-------------+-------------+------------+-----------------------------+


# Result table:
+----------------------+
| immediate_percentage |
+----------------------+
| 33.33                |
+----------------------+

The orders with delivery id 2 and 3are immediate while the others are scheduled.

这个题和第二题很相似，回顾一下第二题，我们通过遵循某个限定条件来计算某个值，这一类的题都很类似，在这里，我们要计算 immediate order 占所有 order 的比例，那么 immediate 的定义是什么呢？很简单，就是当 order_date 和 customer_pref_delivery_date 相等的时候对应的行。

那么思路就是把这些行数出来一共多少行，再除以总的行数，所以我们需要一种 if 函数来筛选并计数。

正确答案如下：

select 
    round(  # round函数取小数点后两位
    100* # 100*转化成百分数
    sum(case when order_date = customer_pref_delivery_date then 1 else 0 end)  
    # case when函数当条件成立时候，对目标进行操作（赋值、转换），注意end结尾！
    # 这里我们一旦满足条件，次数就加1，sum起来就查出了一共满足条件有多少个
	/ count(*)
	,2) as immediate_percentage
from delivery

也可写成：

select 
    round(  # round函数取小数点后两位
    100* # 100*转化成百分数
    sum(if(order_date = customer_pref_delivery_date, 1, 0))  
    # if 和 case when 差不多，看个人喜好
    # 我喜欢用if，因为不用担心最后没加end
	/ count(*)
	,2) as immediate_percentage
from delivery

26. List the Products Ordered in a Period

Table: Products

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| product_id       | int     |
| product_name     | varchar |
| product_category | varchar |
+------------------+---------+

product_id is the primary key for this table.

This table contains data about the company’s products.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| order_date    | date    |
| unit          | int     |
+---------------+---------+

There is no primary key for this table. It may have duplicate rows.

product_id is a foreign key to Products table.

unit is the number of products ordered in order_date.

Write an SQL query to get the names of products with greater than or equal to 100 units ordered in February 2020 and their amount.

Return result table in any order.

The query result format is in the following example:

# Products table:
+-------------+-----------------------+------------------+
| product_id  | product_name          | product_category |
+-------------+-----------------------+------------------+
| 1           | Leetcode Solutions    | Book             |
| 2           | Jewels of Stringology | Book             |
| 3           | HP                    | Laptop           |
| 4           | Lenovo                | Laptop           |
| 5           | Leetcode Kit          | T-shirt          |
+-------------+-----------------------+------------------+

# Orders table:
+--------------+--------------+----------+
| product_id   | order_date   | unit     |
+--------------+--------------+----------+
| 1            | 2020-02-05   | 60       |
| 1            | 2020-02-10   | 70       |
| 2            | 2020-01-18   | 30       |
| 2            | 2020-02-11   | 80       |
| 3            | 2020-02-17   | 2        |
| 3            | 2020-02-24   | 3        |
| 4            | 2020-03-01   | 20       |
| 4            | 2020-03-04   | 30       |
| 4            | 2020-03-04   | 60       |
| 5            | 2020-02-25   | 50       |
| 5            | 2020-02-27   | 50       |
| 5            | 2020-03-01   | 50       |
+--------------+--------------+----------+

# Result table:
+--------------------+---------+
| product_name       | unit    |
+--------------------+---------+
| Leetcode Solutions | 130     |
| Leetcode Kit       | 100     |
+--------------------+---------+

Explanation:

Products with product_id = 1 is ordered in February a total of (60 + 70) = 130.

Products with product_id = 2 is ordered in February a total of 80.

Products with product_id = 3 is ordered in February a total of (2 + 3) = 5.

Products with product_id = 4 was not ordered in February 2020.

Products with product_id = 5is ordered in February a total of (50 + 50) = 100.

这道题不难，但是需要慢慢一步步的拆解，稍微复杂一点，细心一点肯定可以做出来，刷 sql 其实并不是要刷多难的题，而是在你拿到题的那一刻，你能快速的反应，找到解题思路，并有逻辑的表述出来。

我们要找到所有同时满足下列条件的产品名称和其数量

在2020年2月下单的
下单数量大于等于100

正确答案如下：

select product_name, sum(unit) as 'unit'
from products p
join orders o
using (product_id)
where order_date like '2020-02%'
group by 1
having sum(unit) >= 100

其实顺着思路走，还是比较简单的，先连接表格，在筛选时间，最后 group by 加 having 筛选 sum(unit) >= 100 的信息，结束！

27. Not Boring Movies

X city opened a new cinema, many people would like to go to this cinema. The cinema also gives out a poster indicating the movies’ ratings and descriptions.

Please write a SQL query to output movies with an odd numbered ID and a description that is not ‘boring’. Order the result by rating.

For example, table cinema:

+---------+-----------+--------------+-----------+
|   id    | movie     |  description |  rating   |
+---------+-----------+--------------+-----------+
|   1     | War       |   great 3D   |   8.9     |
|   2     | Science   |   fiction    |   8.5     |
|   3     | irish     |   boring     |   6.2     |
|   4     | Ice song  |   Fantacy    |   8.6     |
|   5     | House card|   Interesting|   9.1     |
+---------+-----------+--------------+-----------+

For the example above, the output should be:

+---------+-----------+--------------+-----------+
|   id    | movie     |  description |  rating   |
+---------+-----------+--------------+-----------+
|   5     | House card|   Interesting|   9.1     |
|   1     | War       |   great 3D   |   8.9     |
+---------+-----------+--------------+-----------+

这个题比较简单，用的都是基本的语法，其中筛选奇数的id我们使用了mod函数。

正确答案如下：

select *
from cinema
where description != 'boring'
and mod(id,2) = 1
order by rating desc;

28. Project Employees II

Table: Project

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| project_id  | int     |
| employee_id | int     |
+-------------+---------+

(project_id, employee_id) is the primary key of this table.

employee_id is a foreign key to Employee table.

Table: Employee

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| employee_id      | int     |
| name             | varchar |
| experience_years | int     |
+------------------+---------+

employee_id is the primary key of this table.

Write an SQL query that reports all the projects that have the most employees.

The query result format is in the following example:

# Project table
+-------------+-------------+
| project_id  | employee_id |
+-------------+-------------+
| 1           | 1           |
| 1           | 2           |
| 1           | 3           |
| 2           | 1           |
| 2           | 4           |
+-------------+-------------+

# Employee table
+-------------+--------+------------------+
| employee_id | name   | experience_years |
+-------------+--------+------------------+
| 1           | Khaled | 3                |
| 2           | Ali    | 2                |
| 3           | John   | 1                |
| 4           | Doe    | 2                |
+-------------+--------+------------------+

# result table
+-------------+
| project_id  |
+-------------+
| 1           |
+-------------+
# The first project has 3 employees while the second one has 2.

这种类型的题已经做过很多次了，还是使用 window function 来解决，注意使用 subquery 加上 where 条件来解。

正确答案如下：

select 
	project_id 
from
	(select 
		project_id, 
		dense_rank() over (order by count(*) desc) as rnk 
		# 一定要记住 rank / dense_rank 的语法！
	from project
	group by project_id) as t
where rnk = 1

29. Customer Who Visited but Did Not Make Any Transactions

Table: Visits

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| visit_id    | int     |
| customer_id | int     |
+-------------+---------+

visit_id is the primary key for this table.

This table contains information about the customers who visited the mall.

Table: Transactions

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| transaction_id | int     |
| visit_id       | int     |
| amount         | int     |
+----------------+---------+

transaction_id is the primary key for this table.

This table contains information about the customers who visited the mall.

Write an SQL query to find the IDs of the users who visited without making any transactions and the number of times they made these types of visits.

Return the result table sorted in any order.

The query result format is in the following example:

# Visits
+----------+-------------+
| visit_id | customer_id |
+----------+-------------+
| 1        | 23          |
| 2        | 9           |
| 4        | 30          |
| 5        | 54          |
| 6        | 96          |
| 7        | 54          |
| 8        | 54          |
+----------+-------------+

# Transactions
+----------------+----------+--------+
| transaction_id | visit_id | amount |
+----------------+----------+--------+
| 2              | 5        | 310    |
| 3              | 5        | 300    |
| 9              | 5        | 200    |
| 12             | 1        | 910    |
| 13             | 2        | 970    |
+----------------+----------+--------+

# Result table
+-------------+----------------+
| customer_id | count_no_trans |
+-------------+----------------+
| 54          | 2              |
| 30          | 1              |
| 96          | 1              |
+-------------+----------------+

Explainantion:

Customer with id = 23 visited the mall once and made one transaction during the visit with id = 12.

Customer with id = 9 visited the mall once and made one transaction during the visit with id = 13.

Customer with id = 30 visited the mall once and did not make any transactions.

Customer with id = 54 visited the mall three times. During 2 visits they did not make any transactions, and during one visit they made 3 transactions.

Customer with id = 96 visited the mall once and did not make any transactions.

As we can see, users with IDs 30 and 96 visited the mall one time without making any transactions. Also user 54 visited the mall twice and did not make any transactions.

本题是新题，但是比较简单，不过多讲解了，不过要注意一下几点：

注意表格连接方法的使用
对缺失值的处理
group by 的使用

正确答案如下：

select 
	customer_id, 
	count(*) as count_no_trans
from visits v
left join transactions t
using (visit_id)
where t.visit_id is null 
# 这里要注意是筛选哪一个表中的visit_id为null
group by customer_id

30. Queries Quality and Percentage

Table: Queries

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| query_name  | varchar |
| result      | varchar |
| position    | int     |
| rating      | int     |
+-------------+---------+

There is no primary key for this table, it may have duplicate rows.
This table contains information collected from some queries on a database.

The position column has a value from 1 to 500.

The rating column has a value from 1 to 5. Query with rating less than 3 is a poor query.

We define query quality as:

The average of the ratio between query rating and its position.

We also define poor query percentage as:

The percentage of all queries with rating less than 3.

Write an SQL query to find each query_name, the quality and poor_query_percentage.

Both quality and poor_query_percentage should be rounded to 2 decimal places.

The query result format is in the following example:

# queries table
+------------+-------------------+----------+--------+
| query_name | result            | position | rating |
+------------+-------------------+----------+--------+
| Dog        | Golden Retriever  | 1        | 5      |
| Dog        | German Shepherd   | 2        | 5      |
| Dog        | Mule              | 200      | 1      |
| Cat        | Shirazi           | 5        | 2      |
| Cat        | Siamese           | 3        | 3      |
| Cat        | Sphynx            | 7        | 4      |
+------------+-------------------+----------+--------+

# Result table
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog        | 2.50    | 33.33                 |
| Cat        | 0.66    | 33.33                 |
+------------+---------+-----------------------+

Explaination:

Dog queries quality is ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50
Dog queries poor_ query_percentage is (1 / 3) * 100 = 33.33

Cat queries quality equals ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66
Cat queries poor_ query_percentage is (1 / 3) * 100 = 33.33

根据描述，我们知道要去求出每个query_name对应的rating/position的平均值，并命名为quality，再求出每个query_name对应的rating < 3的比率。

正确答案如下：

select 
    query_name,
    round(AVG(rating/position),2) as quality,
    round(AVG(if(rating < 3, 1, 0)*100),2) as poor_query_percentage
    # if 也可以用 case when
from 
    queries
group by query_name

31. Reformat Department Table

Table: Department

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| revenue       | int     |
| month         | varchar |
+---------------+---------+

(id, month) is the primary key of this table.

The table has information about the revenue of each department per month.

The month has values in ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"].

Write an SQL query to reformat the table such that there is a department id column and a revenue column for each month.

The query result format is in the following example:

# Department table:
+------+---------+-------+
| id   | revenue | month |
+------+---------+-------+
| 1    | 8000    | Jan   |
| 2    | 9000    | Jan   |
| 3    | 10000   | Feb   |
| 1    | 7000    | Feb   |
| 1    | 6000    | Mar   |
+------+---------+-------+

# Result table:
+------+-------------+-------------+-------------+-----+-------------+
| id   | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-------------+
| 1    | 8000        | 7000        | 6000        | ... | null        |
| 2    | 9000        | null        | null        | ... | null        |
| 3    | null        | 10000       | null        | ... | null        |
+------+-------------+-------------+-------------+-----+-------------+

Note that the result table has 13 columns (1 for the department id + 12 for the months).

这个题比较新颖，并不是简单的筛选条件，而是对表格做转换，类似于 pivot table。

那么我们要做的就是针对每个 id，对应某个月份，找到对应的 revenue。

正确答案如下：

select id, sum(case when month = 'Jan' then revenue else NULL end) as Jan_Revenue ,
           sum(case when month = 'Feb' then revenue else NULL end) as Feb_Revenue ,
           sum(case when month = 'Mar' then revenue else NULL end) as Mar_Revenue ,
           sum(case when month = 'Apr' then revenue else NULL end) as Apr_Revenue ,
           sum(case when month = 'May' then revenue else NULL end) as May_Revenue ,
           sum(case when month = 'Jun' then revenue else NULL end) as Jun_Revenue ,
           sum(case when month = 'Jul' then revenue else NULL end) as Jul_Revenue ,
           sum(case when month = 'Aug' then revenue else NULL end) as Aug_Revenue ,
           sum(case when month = 'Sep' then revenue else NULL end) as Sep_Revenue ,
           sum(case when month = 'Oct' then revenue else NULL end) as Oct_Revenue ,
           sum(case when month = 'Nov' then revenue else NULL end) as Nov_Revenue ,
           sum(case when month = 'Dec' then revenue else NULL end) as Dec_Revenue 
from Department 
group by id

注意这里的 sum 其实可以换成其他的 aggregation 函数，因为此时我们只是为了使用 group by 函数。

此时，每个月每个 id 对应的 revenue 是唯一的，所以加上任何的 aggregation 函数都不会影响结果，当然这里的 case when 可以换成 if 函数。

32. Rising Temperature

Table: Weather

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| recordDate    | date    |
| temperature   | int     |
+---------------+---------+

id is the primary key for this table.

This table contains information about the temperature in a certain day.

Write an SQL query to find all dates’ id with higher temperature compared to its previous dates (yesterday).

Return the result table in any order.

The query result format is in the following example:

# Weather
+----+------------+-------------+
| id | recordDate | Temperature |
+----+------------+-------------+
| 1  | 2015-01-01 | 10          |
| 2  | 2015-01-02 | 25          |
| 3  | 2015-01-03 | 20          |
| 4  | 2015-01-04 | 30          |
+----+------------+-------------+
# Result table:
+----+
| id |
+----+
| 2  |
| 4  |
+----+

In 2015-01-02, temperature was higher than the previous day (10 -> 25).
In 2015-01-04, temperature was higher than the previous day (30 -> 20).

这个题目有点难度，在于以下两点：

如何将表格里的内容进行行内比对？
如何找到所有跟前一天相比，温度更高的 id？

其实我们已经接触并熟悉了 self join 的使用了，作用就是自己跟自己进行比对，再根据一定的条件进行筛选。

正确答案如下：

select 
	tb2.Id # 在使用self join时，我们一定要注意取的是哪个表里的数据
from 
	Weather tb1
join 
	Weather tb2 
on datediff(tb2.RecordDate, tb1.RecordDate) = 1 
# The DATEDIFF() function returns the difference between two dates.
# 请参考：https://www.w3schools.com/sql/func_sqlserver_datediff.asp
and tb2.Temperature > tb1.Temperature

33. Sales Analysis I

Table: Product

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| product_id   | int     |
| product_name | varchar |
| unit_price   | int     |
+--------------+---------+

product_id is the primary key of this table.

Table: Sales

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| seller_id   | int     |
| product_id  | int     |
| buyer_id    | int     |
| sale_date   | date    |
| quantity    | int     |
| price       | int     |
+------ ------+---------+

This table has no primary key, it can have repeated rows.

product_id is a foreign key to Product table.

Write an SQL query that reports the best seller by total sales price, If there is a tie, report them all.

The query result format is in the following example:

# Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1          | S8           | 1000       |
| 2          | G4           | 800        |
| 3          | iPhone       | 1400       |
+------------+--------------+------------+

# Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date  | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1         | 1          | 1        | 2019-01-21 | 2        | 2000  |
| 1         | 2          | 2        | 2019-02-17 | 1        | 800   |
| 2         | 2          | 3        | 2019-06-02 | 1        | 800   |
| 3         | 3          | 4        | 2019-05-13 | 2        | 2800  |
+-----------+------------+----------+------------+----------+-------+

# Result table:
+-------------+
| seller_id   |
+-------------+
| 1           |
| 3           |
+-------------+

Both sellers with id 1 and 3 sold products with the most total price of 2800.

通过解释，我们知道要找每个 seller_id 对应的总销售额，并筛选出总销售额最大的 seller_id，如果有一样的，就都筛选出来。

所以我们需要使用：

group by
rank
subquery

正确答案如下：

select seller_id from
(
select 
	seller_id,
	rank() over (order by sum(price) desc) as rnk
	# 注意 rank 函数语法，以及在subquery中的使用
from
	product
join 
	sales
using (product_id)
group by 1 
) tmp
where rnk = 1;

34. Sales Analysis II

Table: Product

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| product_id   | int     |
| product_name | varchar |
| unit_price   | int     |
+--------------+---------+

product_id is the primary key of this table.

Table: Sales

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| seller_id   | int     |
| product_id  | int     |
| buyer_id    | int     |
| sale_date   | date    |
| quantity    | int     |
| price       | int     |
+------ ------+---------+

This table has no primary key, it can have repeated rows.

product_id is a foreign key to Product table.

Write an SQL query that reports the buyers who have bought S8 but not iPhone. Note that S8 and iPhone are products present in the Product table.

The query result format is in the following example:

# Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1          | S8           | 1000       |
| 2          | G4           | 800        |
| 3          | iPhone       | 1400       |
+------------+--------------+------------+

# Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date  | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1         | 1          | 1        | 2019-01-21 | 2        | 2000  |
| 1         | 2          | 2        | 2019-02-17 | 1        | 800   |
| 2         | 1          | 3        | 2019-06-02 | 1        | 800   |
| 3         | 3          | 3        | 2019-05-13 | 2        | 2800  |
+-----------+------------+----------+------------+----------+-------+

# Result table:
+-------------+
| buyer_id    |
+-------------+
| 1           |
+-------------+

The buyer with id 1 bought an S8 but didn’t buy an iPhone. The buyer with id 3 bought both.

通过解释，我们要找到所有的 buyer_id 同时满足：

购买 S8
没买 iPhone

那么我们可以找出每个 buyer_id 对应的 S8 和 iPhone 的购买量分别是多少，并满足购买 S8 的数量大于 0，并且购买 iPhone 的数量等于 0。

正确答案如下：

select buyer_id from
(
select 
    buyer_id, 
    sum(if(product_name = 'S8', 1, 0)) as 's8',
    sum(if(product_name = 'iPhone',1,0)) as 'iPhone'
    # 以上两个sum if语句计算出每个buyer_id对应购买两种产品的数量
from product
join sales
using (product_id)
group by 1
) tmp
where s8 > 0 
and iPhone = 0

35. Sales Analysis III

Table: Product

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| product_id   | int     |
| product_name | varchar |
| unit_price   | int     |
+--------------+---------+

product_id is the primary key of this table.

Table: Sales

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| seller_id   | int     |
| product_id  | int     |
| buyer_id    | int     |
| sale_date   | date    |
| quantity    | int     |
| price       | int     |
+------ ------+---------+

This table has no primary key, it can have repeated rows.

product_id is a foreign key to Product table.

Write an SQL query that reports the products that were only sold in spring 2019. That is, between 2019-01-01 and 2019-03-31 inclusive.

The query result format is in the following example:

# Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1          | S8           | 1000       |
| 2          | G4           | 800        |
| 3          | iPhone       | 1400       |
+------------+--------------+------------+

# Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date  | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1         | 1          | 1        | 2019-01-21 | 2        | 2000  |
| 1         | 2          | 2        | 2019-02-17 | 1        | 800   |
| 2         | 2          | 3        | 2019-06-02 | 1        | 800   |
| 3         | 3          | 4        | 2019-05-13 | 2        | 2800  |
+-----------+------------+----------+------------+----------+-------+

# Result table:
+-------------+--------------+
| product_id  | product_name |
+-------------+--------------+
| 1           | S8           |
+-------------+--------------+

The product with id 1 was only sold in spring 2019 while the other two were sold after.

通过解释，我们要找到只在某一时间段卖出的商品信息，很简单，直接上答案！

TOTALLY WRONG!!!

我们要找的是仅在某一段时间出售的产品，会存在一个产品确实在此时间段出售，但也在别的时间段出售，我们要排除掉！

所以，我们用排除法，使得 product_id 不在 2019 年的夏秋冬所对应的的 product_id，也就是不在 2019-04-01 到 2019-12-31。

正确答案如下：

select 
	product_id,
	product_name 
from
	product
where product_id not in
(select product_id
from sales
where sale_date not between '2019-01-01' and '2019-03-31') 
# 注意不能只单纯限制时间范围
# 通过subquery来限制product_id不在
# 2019-04-01 - 2019-12-31对应的product_id

36. Sales Person

Description

Given three tables: salesperson, company, orders.
Output all the names in the table salesperson, who didn’t have sales to company ‘RED’.

Example
Input

Table: salesperson

+----------+------+--------+-----------------+-----------+
| sales_id | name | salary | commission_rate | hire_date |
+----------+------+--------+-----------------+-----------+
|   1      | John | 100000 |     6           | 4/1/2006  |
|   2      | Amy  | 120000 |     5           | 5/1/2010  |
|   3      | Mark | 65000  |     12          | 12/25/2008|
|   4      | Pam  | 25000  |     25          | 1/1/2005  |
|   5      | Alex | 50000  |     10          | 2/3/2007  |
+----------+------+--------+-----------------+-----------+

The table salesperson holds the salesperson information. Every salesperson has a sales_id and a name.

Table: company

+---------+--------+------------+
| com_id  |  name  |    city    |
+---------+--------+------------+
|   1     |  RED   |   Boston   |
|   2     | ORANGE |   New York |
|   3     | YELLOW |   Boston   |
|   4     | GREEN  |   Austin   |
+---------+--------+------------+

The table company holds the company information. Every company has a com_id and a name.

Table: orders

+----------+------------+---------+----------+--------+
| order_id | order_date | com_id  | sales_id | amount |
+----------+------------+---------+----------+--------+
| 1        |   1/1/2014 |    3    |    4     | 100000 |
| 2        |   2/1/2014 |    4    |    5     | 5000   |
| 3        |   3/1/2014 |    1    |    1     | 50000  |
| 4        |   4/1/2014 |    1    |    4     | 25000  |
+----------+----------+---------+----------+--------+

The table orders holds the sales record information, salesperson and customer company are represented by sales_id and com_id.

output

+------+
| name | 
+------+
| Amy  | 
| Mark | 
| Alex |
+------+

Explanation

According to order ‘3’ and ‘4’ in table orders, it is easy to tell only salesperson ‘John’ and ‘Pam’ have sales to company ‘RED’,
so we need to output all the other names in the table salesperson.

这道题看着比较复杂，有三个表格需要去处理，我们可以利用 salesperson 表格里的 sales_id 作为筛选条件，找到所有的 sales_id 不在 company 和 orders 合并后的 company.name = 'RED' 所对应的 sales_id。

正确答案如下：

select name 
from salesperson
where sales_id not in 
(select sales_id 
from orders
join company 
using (com_id)
where company.name = 'RED')

37. Second Highest Salary

Write a SQL query to get the second highest salary from the Employee table.

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

For example, given the above Employee table, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.

+---------------------+
| SecondHighestSalary |
+---------------------+
| 200                 |
+---------------------+

以下是官方答案：

Solution

Approach: Using sub-query and `LIMIT` clause [Accepted]

Algorithm

Sort the distinct salary in descend order and then utilize the LIMIT clause to get the second highest salary.

SELECT DISTINCT
    Salary AS SecondHighestSalary
FROM
    Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1

However, this solution will be judged as ‘Wrong Answer’ if there is no such second highest salary since there might be only one record in this table. To overcome this issue, we can take this as a temp table.

MySQL

SELECT
    (SELECT DISTINCT
            Salary
        FROM
            Employee
        ORDER BY Salary DESC
        LIMIT 1 OFFSET 1) AS SecondHighestSalary
;

Approach: Using `IFNULL` and `LIMIT` clause [Accepted]

Another way to solve the ‘NULL’ problem is to use IFNULL funtion as below.

MySQL

SELECT
    IFNULL(
      (SELECT DISTINCT Salary
       FROM Employee
       ORDER BY Salary DESC
        LIMIT 1 OFFSET 1),
    NULL) AS SecondHighestSalary

其他答案

select max(salary) as SecondHighestSalary
from employee
where salary < (select max(salary) from employee);
# 利用subquery来先筛出最高的salary，那么小于最高的所有中的
# 最高的就是第二高的salary了！
# 这样也保证了如果没有第二高存在时返回null的情况

错误答案

select salary as `SecondHighestSalary` from
(select 
	salary, 
	dense_rank() over (order by salary desc) as rnk
from employee) tmp
where rnk = 2
# 这种写法不能再没有第二高salary时返回null

38. Shortest Distance in a Line

Table point holds the x coordinate of some points on x-axis in a plane, which are all integers.

Write a query to find the shortest distance between two points in these points.

x
-1
0
2

The shortest distance is ‘1’ obviously, which is from point ‘-1’ to ‘0’. So the output is as below:

shortest
1

Note: Every point is unique, which means there is no duplicates in table point.

Follow-up: What if all these points have an id and are arranged from the left most to the right most of x axis?

本题依旧考察大家对 self join 的使用，我们只有一个表格，里面包含了点的坐标，那么只有把每个坐标和其他的做比对，来找出最小距离。

正确答案如下：

select min(abs(a.x - b.x)) as shortest
from point a, point b
where a.x != b.x

Follow-up Answer:

select min(b.x - a.x)) as shortest
from point a, point b
where b.x > a.x
limit 1
#since all the x is sorted from smallest to the largest, we only need to consider the adjcent one from the larger side.

39. Students and Examinations

Table: Students

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| student_name  | varchar |
+---------------+---------+

student_id is the primary key for this table.

Each row of this table contains the ID and the name of one student in the school.

Table: Subjects

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+

subject_name is the primary key for this table.

Each row of this table contains the name of one subject in the school.

Table: Examinations

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| student_id   | int     |
| subject_name | varchar |
+--------------+---------+

There is no primary key for this table. It may contain duplicates.

Each student from the Students table takes every course from Subjects table.

Each row of this table indicates that a student with ID student_id attended the exam of subject_name.

Write an SQL query to find the number of times each student attended each exam.

Order the result table by student_id and subject_name.

The query result format is in the following example:

# Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1          | Alice        |
| 2          | Bob          |
| 13         | John         |
| 6          | Alex         |
+------------+--------------+

# Subjects table:
+--------------+
| subject_name |
+--------------+
| Math         |
| Physics      |
| Programming  |
+--------------+

# Examination table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1          | Math         |
| 1          | Physics      |
| 1          | Programming  |
| 2          | Programming  |
| 1          | Physics      |
| 1          | Math         |
| 13         | Math         |
| 13         | Programming  |
| 13         | Physics      |
| 2          | Math         |
| 1          | Math         |
+------------+--------------+

# Result table:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1          | Alice        | Math         | 3              |
| 1          | Alice        | Physics      | 2              |
| 1          | Alice        | Programming  | 1              |
| 2          | Bob          | Math         | 1              |
| 2          | Bob          | Physics      | 0              |
| 2          | Bob          | Programming  | 1              |
| 6          | Alex         | Math         | 0              |
| 6          | Alex         | Physics      | 0              |
| 6          | Alex         | Programming  | 0              |
| 13         | John         | Math         | 1              |
| 13         | John         | Physics      | 1              |
| 13         | John         | Programming  | 1              |
+------------+--------------+--------------+----------------+

Explaination:

The result table should contain all students and all subjects.

Alice attended Math exam 3 times, Physics exam 2 times and Programming exam 1 time.

Bob attended Math exam 1 time, Programming exam 1 time and didn’t attend the Physics exam.

Alex didn’t attend any exam.

John attended Math exam 1 time, Physics exam 1 time and Programming exam 1 time.

这个题稍微复杂一些，有三个表格需要处理，但是在我们分析最后想得到的结果时，便可找到切入点。

表格展示出的特点是每个学生对应三个 subject_name，每个 subject_name 又对应参加的考试场次；这就需要我们用 group by student_id, subject_name 加上 count(subject_name) 来实现
再看整个表格的结构，很明显是 cross join 所得的结构，形成学生和科目之间的笛卡尔积
最后，我们注意到 student_id = 6 这个同学很特殊，他没有参加哪怕一场考试。自然我们再连接表格时候要注意了！

正确答案如下：

select 
	a.student_id,
	a.student_name,
	a.subject_name, 
	count(b.subject_name) attended_exams   
	# 这里注意不能换成 count(*) 因为这样是不能数出缺失值的！
from 
	(select * from Students,Subjects) a 
	# 这里我们使用 cross join 来把两个表格合并，构造表格中的结构（笛卡尔积）
left join 
	examinations b 
	# left join 是因为有学生没考试，但是不能忽略
on a.student_id = b.student_id 
and a.subject_name  = b.subject_name 
# 这里 join 的条件是两个
group by 1,3 
# 同理，group by 的条件也是两个，先 student 后 subject_name
order by 1,3

40. Swap Salary

Given a table salary, such as the one below, that has m=male and f=female values. Swap all f and m values (i.e., change all f values to m and vice versa) with a single update statement and no intermediate temp table.

Note that you must write a single update statement, DO NOT write any select statement for this problem.

Example:

id	name	sex	salary
1	A	m	2500
2	B	f	1500
3	C	m	5500
4	D	f	500

After running your update statement, the above salary table should have the following rows:

id	name	sex	salary
1	A	f	2500
2	B	m	1500
3	C	f	5500
4	D	m	500

这里要求我们不能用 select 语句，只可以使用 update 来完成，那么这里我们来讲一下 update 的用法！

SQL UPDATE Statement

The UPDATE statement is used to modify the existing records in a table.

update 是用来修改已经存在的信息

基本语法如下;

UPDATE  
	table_name  
SET  
	column1 = value1, 
	column2 = value2, ...  
WHERE  condition;
# 注意一般情况下，一定要使用where
# 否则很可能会把所有的信息全部更新！

那么回到本题来，我们需要交换 sex 中的 f 和 m。那可以用 if 函数来实现！

正确答案如下：

update salary 
set sex = if(sex = 'm','f','m');
# Runtime: 252 ms, faster than  59.34%  of  MySQL  online submissions for  Swap Salary.
# Memory Usage: 0B, less than  100.00%  of  MySQL  online submissions for  Swap Salary.

但是下面这个要更快一点！

update salary 
set sex = case when sex = 'm' then 'f' else 'm' end
# Runtime: 213 ms, faster than  97.22%  of  MySQL  online submissions for  Swap Salary.
# Memory Usage: 0B, less than  100.00%  of  MySQL  online submissions for  Swap Salary.

41. Top Travellers

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
+---------------+---------+

id is the primary key for this table.
name is the name of the user.

Table: Rides

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| user_id       | int     |
| distance      | int     |
+---------------+---------+

id is the primary key for this table.
user_id is the id of the user who travelled the distance “distance”.

Write an SQL query to report the distance travelled by each user.

Return the result table ordered by travelled_distance in descending order, if two or more users travelled the same distance, order them by their name in ascending order.

The query result format is in the following example.

# Users table:
+------+-----------+
| id   | name      |
+------+-----------+
| 1    | Alice     |
| 2    | Bob       |
| 3    | Alex      |
| 4    | Donald    |
| 7    | Lee       |
| 13   | Jonathan  |
| 19   | Elvis     |
+------+-----------+

# Rides table:
+------+----------+----------+
| id   | user_id  | distance |
+------+----------+----------+
| 1    | 1        | 120      |
| 2    | 2        | 317      |
| 3    | 3        | 222      |
| 4    | 7        | 100      |
| 5    | 13       | 312      |
| 6    | 19       | 50       |
| 7    | 7        | 120      |
| 8    | 19       | 400      |
| 9    | 7        | 230      |
+------+----------+----------+

# Result table:
+----------+--------------------+
| name     | travelled_distance |
+----------+--------------------+
| Elvis    | 450                |
| Lee      | 450                |
| Bob      | 317                |
| Jonathan | 312                |
| Alex     | 222                |
| Alice    | 120                |
| Donald   | 0                  |
+----------+--------------------+

Explaination:

Elvis and Lee travelled 450 miles, Elvis is the top traveller as his name is alphabetically smaller than Lee.

Bob, Jonathan, Alex and Alice have only one ride and we just order them by the total distances of the ride.

Donald didn’t have any rides, the distance travelled by him is 0.

本题比较简单，只需要按部就班去做就可以了！

请注意：

对缺失值的处理
对表格的连接
排序

正确答案如下：

select 
    name, 
    ifnull(sum(distance),0) as 'travelled_distance'
from
	users a
left join 
	rides b
on a.id = b.user_id
group by 1
order by 2 desc, 1 asc;

42. Triangle Judgement

A pupil Tim gets homework to identify whether three line segments could possibly form a triangle.

However, this assignment is very heavy because there are hundreds of records to calculate.

Could you help Tim by writing a query to judge whether these three sides can form a triangle, assuming table triangle holds the length of the three sides x, y and z.

x	y	z
13	15	30
10	20	15

For the sample data above, your query should return the follow result:

x	y	z	triangle
13	15	30	No
10	20	15	Yes

这个题考查我们对三角形的理解，大家都知道三角形成立的条件等价于任意两边之和大于第三边。

所以，我们应该利用条件语句，使 x,y,z 满足条件即可。

正确答案如下：

select 
	x,y,z,
    (case when 
		 x+y>z and x+z>y and y+z>x 
	 then 
		 'Yes' else 'No' end) as 'triangle'
from triangle

43. Unique Orders and Customers Per Month

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| customer_id   | int     |
| invoice       | int     |
+---------------+---------+

order_id is the primary key for this table.

This table contains information about the orders made by customer_id.

Write an SQL query to find the number of unique orders and the number of unique customers with invoices > $20 for each different month.

Return the result table sorted in any order.

The query result format is in the following example:

# orders
+----------+------------+-------------+------------+
| order_id | order_date | customer_id | invoice    |
+----------+------------+-------------+------------+
| 1        | 2020-09-15 | 1           | 30         |
| 2        | 2020-09-17 | 2           | 90         |
| 3        | 2020-10-06 | 3           | 20         |
| 4        | 2020-10-20 | 3           | 21         |
| 5        | 2020-11-10 | 1           | 10         |
| 6        | 2020-11-21 | 2           | 15         |
| 7        | 2020-12-01 | 4           | 55         |
| 8        | 2020-12-03 | 4           | 77         |
| 9        | 2021-01-07 | 3           | 31         |
| 10       | 2021-01-15 | 2           | 20         |
+----------+------------+-------------+------------+

# Result table:
+---------+-------------+----------------+
| month   | order_count | customer_count |
+---------+-------------+----------------+
| 2020-09 | 2           | 2              |
| 2020-10 | 1           | 1              |
| 2020-12 | 2           | 1              |
| 2021-01 | 1           | 1              |
+---------+-------------+----------------+

Explaination:

In September 2020 we have two orders from 2 different customers with invoices > $20.

In October 2020 we have two orders from 1 customer, and only one of the two orders has invoice > $20.

In November 2020 we have two orders from 2 different customers but invoices < $20, so we don’t include that month.

In December 2020 we have two orders from 1 customer both with invoices > $20.

In January 2021 we have two orders from 2 different customers, but only one of them with invoice > $20.

这个题看似比较复杂，但是考查的点都比较基础，请注意：

如何转换日期
注意 orders 和 customers 需要筛选 unique

正确答案如下：

select 
    date_format(order_date, '%Y-%m') as 'month',
    count(distinct order_id) as 'order_count',
    count(distinct customer_id) as 'customer_count'
from orders
where invoice > 20
group by 1

44. User Activity for the Past 30 Days I

Table: Activity

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| session_id    | int     |
| activity_date | date    |
| activity_type | enum    |
+---------------+---------+

There is no primary key for this table, it may have duplicate rows.

The activity_type column is an ENUM of type ('open_session', 'end_session', 'scroll_down', 'send_message').

The table shows the user activities for a social media website.

Note that each session belongs to exactly one user.

Write an SQL query to find the daily active user count for a period of 30 days ending 2019-07-27 inclusively. A user was active on some day if he/she made at least one activity on that day.

The query result format is in the following example:

# Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1       | 1          | 2019-07-20    | open_session  |
| 1       | 1          | 2019-07-20    | scroll_down   |
| 1       | 1          | 2019-07-20    | end_session   |
| 2       | 4          | 2019-07-20    | open_session  |
| 2       | 4          | 2019-07-21    | send_message  |
| 2       | 4          | 2019-07-21    | end_session   |
| 3       | 2          | 2019-07-21    | open_session  |
| 3       | 2          | 2019-07-21    | send_message  |
| 3       | 2          | 2019-07-21    | end_session   |
| 4       | 3          | 2019-06-25    | open_session  |
| 4       | 3          | 2019-06-25    | end_session   |
+---------+------------+---------------+---------------+

# Result table:
+------------+--------------+ 
| day        | active_users |
+------------+--------------+ 
| 2019-07-20 | 2            |
| 2019-07-21 | 2            |
+------------+--------------+

Note that we do not care about days with zero active users.

正确答案如下：

select 
	activity_date as day,count(distinct user_id) active_users 
from activity
where activity_date between ('2019-07-27' - INTERVAL 29 day) 
and  '2019-07-27'
group by 1
order by 1

也可以用 CTE (common table expression) 来写

with t as 
(select 
	activity_date, user_id, count(distinct user_id) as counts 
from activity 
where activity_date between '2019-06-28' and '2019-07-27'
group by activity_date)

select 	
	activity_date as 'day',
    counts as 'active_users'
from t

45. User Activity for the Past 30 Days II

Table: Activity

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| session_id    | int     |
| activity_date | date    |
| activity_type | enum    |
+---------------+---------+

There is no primary key for this table, it may have duplicate rows.

The activity_type column is an ENUM of type ('open_session', 'end_session', 'scroll_down', 'send_message').

The table shows the user activities for a social media website.

Note that each session belongs to exactly one user.

Write an SQL query to find the average number of sessions per user for a period of 30 days ending 2019-07-27 inclusively, rounded to 2 decimal places. The sessions we want to count for a user are those with at least one activity in that time period.

The query result format is in the following example:

Activity table:

# Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1       | 1          | 2019-07-20    | open_session  |
| 1       | 1          | 2019-07-20    | scroll_down   |
| 1       | 1          | 2019-07-20    | end_session   |
| 2       | 4          | 2019-07-20    | open_session  |
| 2       | 4          | 2019-07-21    | send_message  |
| 2       | 4          | 2019-07-21    | end_session   |
| 3       | 2          | 2019-07-21    | open_session  |
| 3       | 2          | 2019-07-21    | send_message  |
| 3       | 2          | 2019-07-21    | end_session   |
| 3       | 5          | 2019-07-21    | open_session  |
| 3       | 5          | 2019-07-21    | scroll_down   |
| 3       | 5          | 2019-07-21    | end_session   |
| 4       | 3          | 2019-06-25    | open_session  |
| 4       | 3          | 2019-06-25    | end_session   |
+---------+------------+---------------+---------------+

# Result table:
+---------------------------+ 
| average_sessions_per_user |
+---------------------------+ 
| 1.33                      |
+---------------------------+

User 1 and 2 each had 1 session in the past 30 days while user 3 had 2 sessions so the average is (1 + 1 + 2) / 3 = 1.33.

这个题属实没啥营养，都是很熟悉的操作了，需要注意：

对缺失值的处理
average_sessions_per_user 的计算

正确答案如下：

select 
	ifnull(round(count(distinct session_id)/count(distinct(user_id)),2),0) 
	as average_sessions_per_user 
from activity
where activity_date between '2019-06-28' and '2019-07-27';

46. Warehouse Manager

Table: Warehouse

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| name         | varchar |
| product_id   | int     |
| units        | int     |
+--------------+---------+

(name, product_id) is the primary key for this table.

Each row of this table contains the information of the products in each warehouse.

Table: Products

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| product_name  | varchar |
| Width         | int     |
| Length        | int     |
| Height        | int     |
+---------------+---------+

product_id is the primary key for this table.

Each row of this table contains the information about the product dimensions (Width, Lenght and Height) in feets of each product.

Write an SQL query to report, How much cubic feet of volume does the inventory occupy in each warehouse.

warehouse_name
volume

Return the result table in any order.

The query result format is in the following example.

# Warehouse table:
+------------+--------------+-------------+
| name       | product_id   | units       |
+------------+--------------+-------------+
| LCHouse1   | 1            | 1           |
| LCHouse1   | 2            | 10          |
| LCHouse1   | 3            | 5           |
| LCHouse2   | 1            | 2           |
| LCHouse2   | 2            | 2           |
| LCHouse3   | 4            | 1           |
+------------+--------------+-------------+

# Products table:
+------------+--------------+------------+----------+-----------+
| product_id | product_name | Width      | Length   | Height    |
+------------+--------------+------------+----------+-----------+
| 1          | LC-TV        | 5          | 50       | 40        |
| 2          | LC-KeyChain  | 5          | 5        | 5         |
| 3          | LC-Phone     | 2          | 10       | 10        |
| 4          | LC-T-Shirt   | 4          | 10       | 20        |
+------------+--------------+------------+----------+-----------+

# Result table:
+----------------+------------+
| `warehouse_name` | `volume   `  | 
+----------------+------------+
| LCHouse1       | 12250      | 
| LCHouse2       | 20250      |
| LCHouse3       | 800        |
+----------------+------------+

Explaination:

题看着花里胡哨，其实一步一步来就行了，要注意volume 的计算方法。

正确答案如下：

select 
    name as 'warehouse_name',
    sum(units*(width*length*height)) as volume
from warehouse
join products
using (product_id)
group by 1

47. Weather Type in Each Country

Table: Countries

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| country_id    | int     |
| country_name  | varchar |
+---------------+---------+

country_id is the primary key for this table.

Each row of this table contains the ID and the name of one country.

Table: Weather

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| country_id    | int     |
| weather_state | varchar |
| day           | date    |
+---------------+---------+

(country_id, day) is the primary key for this table.

Each row of this table indicates the weather state in a country for one day.

Write an SQL query to find the type of weather in each country for November 2019.

The type of weather is Cold if the average weather_state is less than or equal 15, Hot if the average weather_state is greater than or equal 25 and Warm otherwise.

Return result table in any order.

The query result format is in the following example:

# Countries table:
+------------+--------------+
| country_id | country_name |
+------------+--------------+
| 2          | USA          |
| 3          | Australia    |
| 7          | Peru         |
| 5          | China        |
| 8          | Morocco      |
| 9          | Spain        |
+------------+--------------+

# Weather table:
+------------+---------------+------------+
| country_id | weather_state | day        |
+------------+---------------+------------+
| 2          | 15            | 2019-11-01 |
| 2          | 12            | 2019-10-28 |
| 2          | 12            | 2019-10-27 |
| 3          | -2            | 2019-11-10 |
| 3          | 0             | 2019-11-11 |
| 3          | 3             | 2019-11-12 |
| 5          | 16            | 2019-11-07 |
| 5          | 18            | 2019-11-09 |
| 5          | 21            | 2019-11-23 |
| 7          | 25            | 2019-11-28 |
| 7          | 22            | 2019-12-01 |
| 7          | 20            | 2019-12-02 |
| 8          | 25            | 2019-11-05 |
| 8          | 27            | 2019-11-15 |
| 8          | 31            | 2019-11-25 |
| 9          | 7             | 2019-10-23 |
| 9          | 3             | 2019-12-23 |
+------------+---------------+------------+

# Result table:
+--------------+--------------+
| country_name | weather_type |
+--------------+--------------+
| USA          | Cold         |
| Austraila    | Cold         |
| Peru         | Hot          |
| China        | Warm         |
| Morocco      | Hot          |
+--------------+--------------+

Explaination:

Average weather_state in USA in November is (15) / 1 = 15 so weather type is Cold.

Average weather_state in Austraila in November is (-2 + 0 + 3) / 3 = 0.333 so weather type is Cold.

Average weather_state in Peru in November is (25) / 1 = 25 so weather type is Hot.

Average weather_state in China in November is (16 + 18 + 21) / 3 = 18.333 so weather type is Warm.

Average weather_state in Morocco in November is (25 + 27 + 31) / 3 = 27.667 so weather type is Hot.

We know nothing about average weather_state in Spain in November so we don’t include it in the result table.

这道题虽长篇大论，但是核心就是以下几点：

筛选 11 月的数据
计算 weather_state 在 11 月的均值
判断天气是 cold、warm 还是 hot

正确答案如下：

select 
	country_name,
    (case 
        when avg(weather_state) <= 15 then 'Cold' 
        when avg(weather_state) >= 25 then 'Hot'
	    else 'Warm' end) as weather_type
from countries 
join weather
using (country_id)
where day like '2019-11%'
group by country_id

附件

https://gist.github.com/zg104/e5833f0c1002f35d0de602cf5ba8eae3

小结

那么 SQL 简单部分的刷题就告一段落啦，先恭喜大家完成 SQL 简单类型题目的训练，相信大家对 SQL 的函数、语句以及解题关键点都有了基础的了解。SQL 刷题写对了一次不代表你会了，需要的是解题的熟练度与拿到题目快速想出解题思路的能力，任重而道远！

最后我要对你说：

SQL刷题（简单）

题目

Easy

前言

1. Actors and Directors Who Cooperated At Least Three Times

2. Ads Performance

3. Article Views I

4. Average Selling Price

5. Biggest Single Number

6. Combine Two Tables

7. Consecutive Available Seats

8. Customer Order Frequency

9. Customer Placing the Largest Number of Orders

10. Customers Who Never Order

11. Delete Duplicate Emails

12. Duplicate Emails

13. Customers Who Never Order

14. Employees Earning More Than Their Managers

15. Find Customer Referee

16. Find Users With Valid E-Mails

17. Find the Team Size

18. Fix Product Name Format

19. Friend Requests I: Overall Acceptance Rate

20. Unique Orders and Customers Per Month

21. Friendly Movies Streamed Last Month

22. Game Play Analysis I

23. Game Play Analysis II

24. Group Sold Products By The Date

25. Immediate Food Delivery I

26. List the Products Ordered in a Period

27. Not Boring Movies

28. Project Employees II

29. Customer Who Visited but Did Not Make Any Transactions

30. Queries Quality and Percentage

31. Reformat Department Table

32. Rising Temperature

33. Sales Analysis I

34. Sales Analysis II

35. Sales Analysis III

36. Sales Person

37. Second Highest Salary

Solution

Approach: Using sub-query and LIMIT clause [Accepted]

Approach: Using IFNULL and LIMIT clause [Accepted]

38. Shortest Distance in a Line

39. Students and Examinations

40. Swap Salary

41. Top Travellers

42. Triangle Judgement

43. Unique Orders and Customers Per Month

44. User Activity for the Past 30 Days I

45. User Activity for the Past 30 Days II

46. Warehouse Manager

47. Weather Type in Each Country

附件

小结

Approach: Using sub-query and `LIMIT` clause [Accepted]

Approach: Using `IFNULL` and `LIMIT` clause [Accepted]