SQL刷题(中等)

题目

Medium

加粗的题目里面含有follow up question!
斜体的题目是系列套题,可以一起做!
代码整理全部在附件里(文章最后)

前言

这篇文章主要帮助大家开启对 SQL 的深入学习,总结了 LeetCode 中的 Database 中 SQL(中等)里面代表性较高且值得去反复做的 48 道题,希望能对大家深入SQL 有所帮助!

Now, Let’s get started!


1. Active Businesses

Table: Events

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| business_id   | int     |
| event_type    | varchar |
| occurences    | int     | 
+---------------+---------+

(business_id, event_type) is the primary key of this table.

Each row in the table logs the info that an event of some type occured at some business for a number of times.

Write an SQL query to find all active businesses.

An active business is a business that has more than one event type with occurences greater than the average occurences of that event type among all businesses.

The query result format is in the following example:

# Event table:
+-------------+------------+------------+
| business_id | event_type | occurences |
+-------------+------------+------------+
| 1           | reviews    | 7          |
| 3           | reviews    | 3          |
| 1           | ads        | 11         |
| 2           | ads        | 7          |
| 3           | ads        | 6          |
| 1           | page views | 3          |
| 2           | page views | 12         |
+-------------+------------+------------+

# Result table:
+-------------+
| business_id |
+-------------+
| 1           |
+-------------+

Explaination:

Average for 'reviews', 'ads' and 'page views' are (7+3)/2=5, (11+7+6)/3=8, (3+12)/2=7.5 respectively.

Business with id 1 has 7 ‘reviews’ events (more than 5) and 11 ‘ads’ events (more than 8) so it is an active business.

详解:

本题作为中单难度的第一题,在复杂程度上比简单难度的题有明显提高!

本题目的去找所有 active business,那么对于其定义是

所有 occurance 大于其 event_type 对应的 occurance 平均值并且对应的 event_type 个数大于 1 的 business_id

需要注意以下几点:

首先我们要找到 event_type 对应的 occurance 的均值

select 
	event_type, AVG(occurences) as avg_occ
from events
group by 1

很简单,这是很基础的,那么接下来我们需要把 events 表格中的 occurance 与均值来比较,那么我们采取表格连接的办法,使其合并到一起

select 
    *
from 
(
    select event_type, AVG(occurences) as avg_occ
    from events
    group by 1
) tmp
join events
using (event_type)

结果如下:

{"headers": 
["event_type", "avg_occ", "business_id", "occurences"], 
"values": 
[["reviews", 5.0000, 1, 7],
["reviews", 5.0000, 3, 3], 
["ads", 8.0000, 1, 11], 
["ads", 8.0000, 2, 7], 
["ads", 8.0000, 3, 6], 
["page views", 7.5000, 1, 3], 
["page views", 7.5000, 2, 12]]

那么我们要做到同时满足:对于每个 business_id,有超过一个 event_type 对应的 occurance 大于 avg_occ

正确答案如下:

select 
    business_id
from 
(
    select event_type, AVG(occurences) as avg_occ
    from events
    group by 1
) tmp
join events
using (event_type)
group by 1
having sum(if(occurences > avg_occ,1,0)) > 1
# sum(if(occurences > avg_occ,1,0)) 是为了数满足条件的有几个

2. Active Users

Table Accounts:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
+---------------+---------+

the id is the primary key for this table.

This table contains the account id and the user name of each account.

Table Logins:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| login_date    | date    |
+---------------+---------+

There is no primary key for this table, it may contain duplicates.

This table contains the account id of the user who logged in and the login date. A user may log in multiple times in the day.

Write an SQL query to find the id and the name of active users.

Active users are those who logged in to their accounts for 5 or more consecutive days.

Return the result table ordered by the id.

The query result format is in the following example:

# Accounts table:
+----+----------+
| id | name     |
+----+----------+
| 1  | Winston  |
| 7  | Jonathan |
+----+----------+

# Logins table:
+----+------------+
| id | login_date |
+----+------------+
| 7  | 2020-05-30 |
| 1  | 2020-05-30 |
| 7  | 2020-05-31 |
| 7  | 2020-06-01 |
| 7  | 2020-06-02 |
| 7  | 2020-06-02 |
| 7  | 2020-06-03 |
| 1  | 2020-06-07 |
| 7  | 2020-06-10 |
+----+------------+

# Result table:
+----+----------+
| id | name     |
+----+----------+
| 7  | Jonathan |
+----+----------+

User Winston with id = 1 logged in 2 times only in 2 different days, so, Winston is not an active user.

User Jonathan with id = 7 logged in 7 times in 6 different days, five of them were consecutive days, so, Jonathan is an active user.

详解:

我们要找登录账户连续五天或以上的用户,也就是活跃用户

需要注意:

日期上的连续五天,其实需要满足两个条件:

我们先看第一个:排序

利用 window function,我们有:

# 这个是CTE
WITH t AS (
    SELECT 
	    DISTINCT id, 
		login_date, 
		DENSE_RANK() OVER (PARTITION BY id ORDER BY login_date) rank_date
FROM Logins
)

# select * from t

我们看一下结果

{"headers": ["id", "login_date", "rank_date"], 
"values": 
[[1, "2020-05-30", 1], 
[1, "2020-06-07", 2], 
[7, "2020-05-30", 1], 
[7, "2020-05-31", 2], 
[7, "2020-06-01", 3], 
[7, "2020-06-02", 4], 
[7, "2020-06-03", 5], 
[7, "2020-06-10", 6]]}

那么再来看第二个条件:

select id, name 
from Accounts 
where id IN (
    select t1.id 
    from t t1
    join t t2
    on t1.login_date = DATE_ADD(t2.login_date, INTERVAL 4 DAY) 
    # 存在 日期A - 日期B = 4(天)
    and t1.rank_date - t2.rank_date = 4
    # 最大的日期对应的序 - 最小的日期对应的序 = 4
    and t1.id = t2.id)
order by 1
# 那么同时满足这两个条件的 id 和 name 一定保证连续5天登陆
# 既然连续5天满足了,那么连续超过五天也必然满足,是属于包含关系的!
# 因为假如有七天是连续的,那么有三种连续五天的情况,也都是满足上述两个条件

正确答案如下:

with t as (
    select 
	    distinct id, 
	    login_date, 
	    DENSE_RANK() OVER (PARTITION BY id ORDER BY login_date) as rank_date
	from Logins
)

select id, name 
from Accounts 
where id IN (
    select t1.id 
    from t t1
    join t t2
    on t1.login_date = DATE_ADD(t2.login_date, INTERVAL 4 DAY) 
    and t1.rank_date - t2.rank_date = 4
    and t1.id = t2.id)
order by 1

Follow-up:
Can you write a general solution if the active users are those who logged in to their accounts for n or more consecutive days?

这就很简单了,我们只需要在原来的基础上做些改动即可。

CREATE FUNCTION N_consecutive_days(N INT) RETURNS INT
BEGIN
  RETURN (
      with t as (
    select 
	    distinct id, 
	    login_date, 
	    DENSE_RANK() OVER (PARTITION BY id ORDER BY login_date) as rank_date
	from Logins
	)

	select id, name 
	from Accounts 
	where id IN (
	    select t1.id 
	    from t t1
	    join t t2
	    on t1.login_date = DATE_ADD(t2.login_date, INTERVAL N-1 DAY) 
	    and t1.rank_date - t2.rank_date = N-1
	    and t1.id = t2.id)
	order by 1
  );
END

3. Activity Participants

Table: Friends

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
| activity      | varchar |
+---------------+---------+

id is the id of the friend and primary key for this table.

name is the name of the friend.

activity is the name of the activity which the friend takes part in.

Table: Activities

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
+---------------+---------+

id is the primary key for this table.

name is the name of the activity.

Write an SQL query to find the names of all the activities with neither maximum, nor minimum number of participants.

Return the result table in any order. Each activity in table Activities is performed by any person in the table Friends.

The query result format is in the following example:

# Friends table:
+------+--------------+---------------+
| id   | name         | activity      |
+------+--------------+---------------+
| 1    | Jonathan D.  | Eating        |
| 2    | Jade W.      | Singing       |
| 3    | Victor J.    | Singing       |
| 4    | Elvis Q.     | Eating        |
| 5    | Daniel A.    | Eating        |
| 6    | Bob B.       | Horse Riding  |
+------+--------------+---------------+

# Activities table:
+------+--------------+
| id   | name         |
+------+--------------+
| 1    | Eating       |
| 2    | Singing      |
| 3    | Horse Riding |
+------+--------------+

# Result table:
+--------------+
| activity     |
+--------------+
| Singing      |
+--------------+

Eating activity is performed by 3 friends, maximum number of participants, (Jonathan D. , Elvis Q. and Daniel A.)

Horse Riding activity is performed by 1 friend, minimum number of participants, (Bob B.)

Singing is performed by 2 friends (Victor J. and Jade W.)

详解:

本题比较直接明了,我们要找既不是最多人选也不是最少人选的 activity。

首先,肯定要计算每个 activity 出现的次数

# 使用CTE方便我们接下来的操作
with t as 
(select activity, count(activity) as c
from friends
group by 1)

# select * from t

结果如下:

{"headers": ["activity", "c"], "values": 
[["Eating", 3], 
["Singing", 2], 
["Horse Riding", 1]]}

自然我们要选这里的 singing,所以我们要保证 c 要小于这一列的最大值,并且大于这里的最小值。

select activity
from t
where c < (select max(c) from t)
and c > (select min(c) from t);

注意:下面这种写法是不行的!

select activity
from t
where c < max(c)
and c > min(c);

正确答案如下:

with t as 
(select activity, count(activity) as c
from friends
group by 1)

select activity
from t
where c < (select max(c) from t)
and c > (select min(c) from t);

4. All People Report to the Given Manager

Table: Employees

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| employee_id   | int     |
| employee_name | varchar |
| manager_id    | int     |
+---------------+---------+

employee_id is the primary key for this table.

Each row of this table indicates that the employee with ID employee_id and name employee_name reports his work to his/her direct manager with manager_id

The head of the company is the employee with employee_id = 1.

Write an SQL query to find employee_id of all employees that directly or indirectly report their work to the head of the company.

The indirect relation between managers will not exceed 3 managers as the company is small.

Return result table in any order without duplicates.

The query result format is in the following example:

# Employees table:
+-------------+---------------+------------+
| employee_id | employee_name | manager_id |
+-------------+---------------+------------+
| 1           | Boss          | 1          |
| 3           | Alice         | 3          |
| 2           | Bob           | 1          |
| 4           | Daniel        | 2          |
| 7           | Luis          | 4          |
| 8           | Jhon          | 3          |
| 9           | Angela        | 8          |
| 77          | Robert        | 1          |
+-------------+---------------+------------+

# Result table:
+-------------+
| employee_id |
+-------------+
| 2           |
| 77          |
| 4           |
| 7           |
+-------------+

The head of the company is the employee with employee_id 1.

The employees with employee_id 2 and 77 report their work directly to the head of the company.

The employee with employee_id 4 report his work indirectly to the head of the company 4 --> 2 --> 1.

The employee with employee_id 7 report his work indirectly to the head of the company 7 --> 4 --> 2 --> 1.

The employees with employee_id 3, 8 and 9 don’t report their work to head of company directly or indirectly.

详解:

这个题目有点绕来绕去的,简单来说就是员工向上级汇报,有两种情况:

那么这里就需要不断的连接表格,然后根据条件来筛选满足条件的 employee_id

那么根据本题,我们有 7 --> 4 --> 2 --> 1 的情况出现,那么自然在我们得知 7 汇报给 4 后,4 汇报给谁呢?这是我们要 self join 一次,得知 4 汇报给 2,那么 2 汇报给了谁?需要再 self join 一次,得知 2 汇报给了 1。

那么我们先把表格连接起来

select *
from Employees e1
join Employees e2
on e1.manager_id=e2.employee_id
join Employees e3
on e2.manager_id=e3.employee_id

结果如下:

{"headers": ["employee_id", "employee_name", "manager_id", "employee_id", "employee_name", "manager_id", "employee_id", "employee_name", "manager_id"], "values": 
[[4, "Daniel", 2, 2, "Bob", 1, 1, "Boss", 1], 
[1, "Boss", 1, 1, "Boss", 1, 1, "Boss", 1], 
[2, "Bob", 1, 1, "Boss", 1, 1, "Boss", 1], 
[77, "Robert", 1, 1, "Boss", 1, 1, "Boss", 1], 
[9, "Angela", 8, 8, "John", 3, 3, "Alice", 3], 
[3, "Alice", 3, 3, "Alice", 3, 3, "Alice", 3], 
[8, "John", 3, 3, "Alice", 3, 3, "Alice", 3], 
[7, "Luis", 4, 4, "Daniel", 2, 2, "Bob", 1]]}

从上面的数据来看,我们要满足下面的条件:

正确答案如下:

select distinct(e1.employee_id)
from Employees e1
join Employees e2
on e1.manager_id=e2.employee_id
join Employees e3
on e2.manager_id=e3.employee_id
where
(e1.manager_id=1
or e2.manager_id=1
or e3.manager_id=1) and e1.employee_id!=1
order by 1;

那么要是存在一个非常长的递归序列,例如:100 --> 99 --> … --> 1,那么难道我们要 self join 98 次么?显然这种写法不理想,那么我们可以采用递归的写法。

递归写法如下:

with recursive cte as (
select employee_id from Employees where employee_id <> 1 and manager_id = 1
union
select e.employee_id from Employees e join cte c on c.employee_id = e.manager_id)

select employee_id from cte;

解释一下:

5. Apples & Oranges

Table: Sales

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| sale_date     | date    |
| fruit         | enum    | 
| sold_num      | int     | 
+---------------+---------+

(sale_date,fruit) is the primary key for this table.

This table contains the sales of “apples” and “oranges” sold each day.

Write an SQL query to report the difference between number of apples and oranges sold each day.

Return the result table ordered by sale_date in format (‘YYYY-MM-DD’).

The query result format is in the following example:

# Sales table:
+------------+------------+-------------+
| sale_date  | fruit      | sold_num    |
+------------+------------+-------------+
| 2020-05-01 | apples     | 10          |
| 2020-05-01 | oranges    | 8           |
| 2020-05-02 | apples     | 15          |
| 2020-05-02 | oranges    | 15          |
| 2020-05-03 | apples     | 20          |
| 2020-05-03 | oranges    | 0           |
| 2020-05-04 | apples     | 15          |
| 2020-05-04 | oranges    | 16          |
+------------+------------+-------------+

# Result table:
+------------+--------------+
| sale_date  | diff         |
+------------+--------------+
| 2020-05-01 | 2            |
| 2020-05-02 | 0            |
| 2020-05-03 | 20           |
| 2020-05-04 | -1           |
+------------+--------------+

Day 2020-05-01, 10 apples and 8 oranges were sold (Difference 10 - 8 = 2).

Day 2020-05-02, 15 apples and 15 oranges were sold (Difference 15 - 15 = 0).

Day 2020-05-03, 20 apples and 0 oranges were sold (Difference 20 - 0 = 20).

Day 2020-05-04, 15 apples and 16 oranges were sold (Difference 15 - 16 = -1).

详解:

我们要计算每一天里的苹果和橘子销量的差值,那么我们要做的就是拆开表格中的结构,去寻找苹果和橘子各自在每一天的销量,变成两列,在把列相减即可。

select 
    sale_date,
    sum(if(fruit = 'apples',sold_num,0)) as apple,
    sum(if(fruit = 'oranges', sold_num,0)) as orange
from sales
group by sale_date
order by 1

结果如下:

{"headers": ["sale_date", "apple", "orange"], 
"values": 
[["2020-05-01", 10, 8], 
["2020-05-02", 15, 15], 
["2020-05-03", 20, 0], 
["2020-05-04", 15, 16]]}

正确答案如下:

select 
    sale_date,
    sum(if(fruit = 'apples',sold_num,0)) - sum(if(fruit = 'oranges', sold_num,0)) 
    as 'diff'
from sales
group by sale_date
order by 1

6. Article Views II

如果想回顾一下 Article Views I,请点击这里

Table: Views

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+

There is no primary key for this table, it may have duplicate rows.

Each row of this table indicates that some viewer viewed an article (written by some author) on some date.

Note that equal author_id and viewer_id indicate the same person.

Write an SQL query to find all the people who viewed more than one article on the same date, sorted in ascending order by their id.

The query result format is in the following example:

# Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 3          | 4         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

# Result table:
+------+
| id   |
+------+
| 5    |
| 6    |
+------+

详解:

我们要找到所有的再同一天查看超过一篇不同文章的人,那么我们应该:

  1. 找到在每一天,每一个人看不同文章的次数
  2. 这个次数大于 1
  3. 按照 id 升序排序

那么,对于第一个条件,我们有:

select 
	view_date, 
	viewer_id, 
	count(distinct article_id) as counts
from views
group by view_date, viewer_id

结果如下:

{"headers": ["view_date", "viewer_id", "counts"], 
"values": 
[["2019-07-21", 4, 1], 
["2019-07-22", 1, 1], 
["2019-08-01", 5, 2], 
["2019-08-01", 7, 1], 
["2019-08-02", 6, 2]]}

自然,我们要选 viewer_id 是 5 和 6的,因为 counts 大于 1。

那么我们更改原代码,把 count(distinct article_id) 修改成 having 的条件筛选形式,并且筛选所有的 distinct id。

正确答案如下:

select distinct viewer_id as 'id'
from views
group by view_date, viewer_id
having count(distinct article_id) > 1 
order by 1

7. Bank Account Summary

Table: Users

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| user_id      | int     |
| user_name    | varchar |
| credit       | int     |
+--------------+---------+

user_id is the primary key for this table.

Each row of this table contains the current credit information for each user.

Table: Transactions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| trans_id      | int     |
| paid_by       | int     |
| paid_to       | int     |
| amount        | int     |
| transacted_on | date    |
+---------------+---------+

trans_id is the primary key for this table.

Each row of this table contains the information about the transaction in the bank.

User with id (paid_by) transfer money to user with id (paid_to).

Leetcode Bank (LCB) helps its coders in making virtual payments. Our bank records all transactions in the table Transaction, we want to find out the current balance of all users and check wheter they have breached their credit limit (If their current credit is less than 0).

Write an SQL query to report.

Return the result table in any order.

The query result format is in the following example.

# Users table:
+------------+--------------+-------------+
| user_id    | user_name    | credit      |
+------------+--------------+-------------+
| 1          | Moustafa     | 100         |
| 2          | Jonathan     | 200         |
| 3          | Winston      | 10000       |
| 4          | Luis         | 800         | 
+------------+--------------+-------------+

# Transactions table:
+------------+------------+------------+----------+---------------+
| trans_id   | paid_by    | paid_to    | amount   | transacted_on |
+------------+------------+------------+----------+---------------+
| 1          | 1          | 3          | 400      | 2020-08-01    |
| 2          | 3          | 2          | 500      | 2020-08-02    |
| 3          | 2          | 1          | 200      | 2020-08-03    |
+------------+------------+------------+----------+---------------+

# Result table:
+------------+------------+------------+-----------------------+
| `user_id `   | `user_name`  | `credit `    | `credit_limit_breached` |
+------------+------------+------------+-----------------------+
| 1          | Moustafa   | -100       | Yes                   | 
| 2          | Jonathan   | 500        | No                    |
| 3          | Winston    | 9900       | No                    |
| 4          | Luis       | 800        | No                    |
+------------+------------+------------+-----------------------+

Moustafa paid $400 on “2020-08-01” and received $200 on “2020-08-03”, credit (100 -400 +200) = -$100

Jonathan received $500 on “2020-08-02” and paid $200 on “2020-08-08”, credit (200 +500 -200) = $500

Winston received $400 on “2020-08-01” and paid $500 on “2020-08-03”, credit (10000 +400 -500) = $9990

Luis didn’t received any transfer, credit = $800

详解:

本题要求我们找出当前的用户的 balance,并且判断他们是否违约(balance < 0 即违约)

注意到他们之间是互相进行转账的,那么在经过一系列的转账后,再加上之前的 credit,就得到每个人最后的 balance,最终再根据 balance 的正负判断违约情况。

那么解决问题需要以下几个步骤:

那么第一步,我们有:

# paid by 
# result table 中的 user_id 也就是 transactions 中的 paid_by

select 
	paid_by, sum(amount) as 'out'
from 
	transactions
group by paid_by

结果如下:

{"headers": ["paid_by", "out"], "values": 
[[1, 400], 
[3, 500], 
[2, 200]]}
# 说明id = 1,2,3 分别一共转出400,200,500
# paid to
select 
	paid_to, sum(amount) as 'in'
from 
	transactions
group by paid_to

结果如下:

{"headers": ["paid_to", "in"], "values": 
[[3, 400], 
[2, 500], 
[1, 200]]}
# 说明id = 1,2,3 分别一共收到200,500,400

那么用 CTE 把他们合起来,方便接下来的操作

with paid_to as
(
    select paid_to,
    sum(amount) AS 'in'
    from Transactions
    group by paid_to
),
paid_by as
(
    select paid_by,
    sum(amount) as 'out'
    from Transactions
    group by paid_by
)

接下来的三步,可以合在一起来看,就是根据之前的那个公式来计算最后每个人的 balance,再用 if 来判断违约情况,最后 join 取出需要的列!

select
    u.user_id,
    u.user_name,
    u.credit + ifnull(pt.in, 0) - ifnull(pb.out, 0) AS credit,
    # credit + 转入 - 转出 = balance
    if(u.credit + ifnull(pt.in, 0) - ifnull(pb.out, 0) < 0,'Yes','No') 
    as credit_limit_breached
    # 判断是否违约,注意这里不能用 alias 也就是 credit 代替
from Users u
left join paid_to pt
# left join 是因为 id = 4 没有转账,所以是0,前面的 ifnull 也是为了处理缺失值
on u.user_id = pt.paid_to
left join paid_by pb
# 同上
on u.user_id = pb.paid_by;

正确答案如下:

with paid_to as
(
    select paid_to,
    sum(amount) AS 'in'
    from Transactions
    group by paid_to
),
paid_by as
(
    select paid_by,
    sum(amount) as 'out'
    from Transactions
    group by paid_by
)

select
    u.user_id,
    u.user_name,
    u.credit + ifnull(pt.in, 0) - ifnull(pb.out, 0) AS credit,
    if(u.credit + ifnull(pt.in, 0) - ifnull(pb.out, 0) < 0,'Yes','No') 
    as credit_limit_breached
from Users u
left join paid_to pt
on u.user_id = pt.paid_to
left join paid_by pb
on u.user_id = pb.paid_by;

8. Calculate Salaries

Table Salaries:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| company_id    | int     |
| employee_id   | int     |
| employee_name | varchar |
| salary        | int     |
+---------------+---------+

(company_id, employee_id) is the primary key for this table.

This table contains the company id, the id, the name and the salary for an employee.

Write an SQL query to find the salaries of the employees after applying taxes.

The tax rate is calculated for each company based on the following criteria:

Return the result table in any order. Round the salary to the nearest integer.

The query result format is in the following example:

# Salaries table:
+------------+-------------+---------------+--------+
| company_id | employee_id | employee_name | salary |
+------------+-------------+---------------+--------+
| 1          | 1           | Tony          | 2000   |
| 1          | 2           | Pronub        | 21300  |
| 1          | 3           | Tyrrox        | 10800  |
| 2          | 1           | Pam           | 300    |
| 2          | 7           | Bassem        | 450    |
| 2          | 9           | Hermione      | 700    |
| 3          | 7           | Bocaben       | 100    |
| 3          | 2           | Ognjen        | 2200   |
| 3          | 13          | Nyancat       | 3300   |
| 3          | 15          | Morninngcat   | 1866   |
+------------+-------------+---------------+--------+

# Result table:
+------------+-------------+---------------+--------+
| company_id | employee_id | employee_name | salary |
+------------+-------------+---------------+--------+
| 1          | 1           | Tony          | 1020   |
| 1          | 2           | Pronub        | 10863  |
| 1          | 3           | Tyrrox        | 5508   |
| 2          | 1           | Pam           | 300    |
| 2          | 7           | Bassem        | 450    |
| 2          | 9           | Hermione      | 700    |
| 3          | 7           | Bocaben       | 76     |
| 3          | 2           | Ognjen        | 1672   |
| 3          | 13          | Nyancat       | 2508   |
| 3          | 15          | Morninngcat   | 5911   |
+------------+-------------+---------------+--------+

For company 1, Max salary is 21300. Employees in company 1 have taxes = 49%

For company 2, Max salary is 700. Employees in company 2 have taxes = 0%

For company 3, Max salary is 7777. Employees in company 3 have taxes = 24%

The salary after taxes = salary - (taxes percentage / 100) * salary

For example, Salary for Morninngcat (3, 15) after taxes = 7777 - 7777 * (24 / 100) = 7777 - 1866.48 = 5910.52, which is rounded to 5911.

详解:

根据文中信息,我们需要计算出三种情况下的税,并算出每个公司,每个人的扣税后的工资:

那么我们要按步骤来解决问题:

第一步,我们有:

select 
	company_id, 
	employee_id, 
	employee_name, 
	salary, 
	max(salary) over(partition by company_id) as maxsal
	# max也可以用partition by来对每个company_id的最大值来进行拆分计算
	# https://www.sqltutorial.org/sql-window-functions/sql-partition-by/
from salaries

结果如下:

{"headers": ["company_id", "employee_id", "employee_name", "salary", "maxsal"], 
"values": 
[[1, 1, "Tony", 2000, 21300], 
[1, 2, "Pronub", 21300, 21300], 
[1, 3, "Tyrrox", 10800, 21300], 
[2, 1, "Pam", 300, 700], 
[2, 7, "Bassem", 450, 700], 
[2, 9, "Hermione", 700, 700], 
[3, 7, "Bocaben", 100, 7777], 
[3, 2, "Ognjen", 2200, 7777], 
[3, 13, "Nyancat", 3300, 7777], 
[3, 15, "Morninngcat", 7777, 7777]]}

那么通过 CTE,并继续用 case when 来判断是否满足交税条件。

正确答案如下:

with tmp as 
(select 
	company_id, 
	employee_id, 
	employee_name, 
	salary, 
	max(salary) over(partition by company_id) as maxsal
from salaries)

select 
	company_id, 
	employee_id, 
	employee_name, 
	round(case 
			when maxsal < 1000 then salary
			when maxsal > 10000 then salary*0.51
			else salary*0.76 
		  end) as salary
from tmp

另一个答案

select 
	a.company_id, 
	a.employee_id, 
	a.employee_name, 
	round(a.salary*b.multiplier) as 'salary'
from salaries a
join 
(select 
    company_id, employee_id, employee_name,
    case 
        when max(salary) < 1000 then 1
        when max(salary) between 1000 and 10000 then (1-0.24)
        else (1-0.49) 
    end as 'multiplier'
    # case when 筛选出了税收的乘子
from salaries
group by 1) b
using (company_id)
# 根据 company_id 把两个表join到一起
# 使得乘子和工资相乘成为税后工资

经过试验,这两个写法的速度平均都超过的 88% 的用户,所以都是可取的。

9. Capital Gain/Loss

Table: Stocks

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| stock_name    | varchar |
| operation     | enum    |
| operation_day | int     |
| price         | int     |
+---------------+---------+

(stock_name, day) is the primary key for this table.

The operation column is an ENUM of type ('Sell', 'Buy')

Each row of this table indicates that the stock which has stock_name had an operation on the day operation_day with the price.

It is guaranteed that each ‘Sell’ operation for a stock has a corresponding ‘Buy’ operation in a previous day.

Write an SQL query to report the Capital gain/loss for each stock.

The capital gain/loss of a stock is total gain or loss after buying and selling the stock one or many times.

Return the result table in any order.

The query result format is in the following example:

# Stock table:
+---------------+-----------+---------------+--------+
| stock_name    | operation | operation_day | price  |
+---------------+-----------+---------------+--------+
| Leetcode      | Buy       | 1             | 1000   |
| Corona Masks  | Buy       | 2             | 10     |
| Leetcode      | Sell      | 5             | 9000   |
| Handbags      | Buy       | 17            | 30000  |
| Corona Masks  | Sell      | 3             | 1010   |
| Corona Masks  | Buy       | 4             | 1000   |
| Corona Masks  | Sell      | 5             | 500    |
| Corona Masks  | Buy       | 6             | 1000   |
| Handbags      | Sell      | 29            | 7000   |
| Corona Masks  | Sell      | 10            | 10000  |
+---------------+-----------+---------------+--------+

# Result table:
+---------------+-------------------+
| stock_name    | capital_gain_loss |
+---------------+-------------------+
| Corona Masks  | 9500              |
| Leetcode      | 8000              |
| Handbags      | -23000            |
+---------------+-------------------+

Leetcode stock was bought at day 1 for 1000$ and was sold at day 5 for 9000. Capital gain = 9000 - 1000 = 8000.

Handbags stock was bought at day 17 for 30000 and was sold at day 29 for 7000. Capital loss = 7000 - 30000 = -23000.

Corona Masks stock was bought at day 1 for 10$ and was sold at day 3 for 1010. It was bought again at day 4 for 1000 and was sold at day 5 for 500$.

At last, it was bought at day 6 for 1000 and was sold at day 10 for 10000. Capital gain/loss is the sum of capital gains/losses for each (‘Buy’ --> ‘Sell’) operation = (1010 - 10) + (500 - 1000) + (10000 - 1000) = 1000 - 500 + 9000 = 9500.

详解:

还记第七题中我们要分别计算 paid_to 和 paid_by 的总额,最后再把他们进行计算,本题也是这样把 sell 的总和减去 buy 的总和即可

正确答案如下:

select 
	stock_name,
    sum(case when operation = 'Buy' then -price else price end) 
    as capital_gain_loss 
    # buy 则要减去 price
    # sell 则加上
from stocks
group by stock_name

10. Consecutive Numbers

Write a SQL query to find all numbers that appear at least three times consecutively.

+----+-----+
| Id | Num |
+----+-----+
| 1  |  1  |
| 2  |  1  |
| 3  |  1  |
| 4  |  2  |
| 5  |  1  |
| 6  |  2  |
| 7  |  2  |
+----+-----+

For example, given the above Logs table, 1 is the only number that appears consecutively for at least three times.

+-----------------+
| ConsecutiveNums |
+-----------------+
| 1               |
+-----------------+

详解:

同一数字连续出现意味着:

那么我们可以用三个别名命名 Logs,然后查看是否存在三个连续数字是相等的,对于超过三个的情况,是等同于三个连续的情况,因为超过三个数字连续必然其中的任意三个连续 id 的数字是相等的。

正确答案如下:

select distinct a.num as ConsecutiveNums 
from 
	logs a,
	logs b,
	logs c
where a.id + 1 = b.id
and b.id + 1 = c.id
# id 紧挨着
and a.num = b.num 
and b.num = c.num
# num 相等

另一种写法:

介绍一下两个函数:

select distinct(a.Num) as ConsecutiveNums
from (
	select
	Num,
	lag(Num) over () as above,
	# lag可以查看前一行的值
	lead(Num) over () as below
	# lead可查看后一行的值
	from Logs
	) a
where Num=above
and Num=below

11. Count Student Number in Departments

A university uses 2 data tables, student and department, to store data about its students and the departments associated with each major.

Write a query to print the respective department name and number of students majoring in each department for all departments in the department table (even ones with no current students).

Sort your results by descending number of students; if two or more departments have the same number of students, then sort those departments alphabetically by department name.

The student is described as follow:

Column Name Type
student_id Integer
student_name String
gender Character
dept_id Integer

where student_id is the student’s ID number, student_name is the student’s name, gender is their gender, and dept_id is the department ID associated with their declared major.

And the department table is described as below:

Column Name Type
dept_id Integer
dept_name String

where dept_id is the department’s ID number and dept_name is the department name.

Here is an example input:
student table:

student_id student_name gender dept_id
1 Jack M 1
2 Jane F 1
3 Mark M 2

department table:

dept_id dept_name
1 Engineering
2 Science
3 Law

The Output should be:

dept_name student_number
Engineering 2
Science 1
Law 0

详解:

这个题目比较简单,我们要去计算每个 dept_name 对应的学生数量,这个题的难度甚至可以归类为简单。

我们只需要把表格连接到一起(注意缺失值)然后 count 每个 dept_name 对应的学生数量即可。

正确答案如下:

select d.dept_name, count(student_id) as student_number
from department d
left join student s
using (dept_id)
group by d.dept_id
order by 2 desc, 1;

12. Countries You Can Safely Invest In

Table Person:

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| id             | int     |
| name           | varchar |
| phone_number   | varchar |
+----------------+---------+

id is the primary key for this table.
Each row of this table contains the name of a person and their phone number.

Phone number will be in the form ‘xxx-yyyyyyy’ where xxx is the country code (3 characters) and yyyyyyy is the phone number (7 characters) where x and y are digits. Both can contain leading zeros.

Table Country:

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| name           | varchar |
| country_code   | varchar |
+----------------+---------+

country_code is the primary key for this table.

Each row of this table contains the country name and its code.

country_code will be in the form ‘xxx’ where x is digits.

Table Calls:

+-------------+------+
| Column Name | Type |
+-------------+------+
| caller_id   | int  |
| callee_id   | int  |
| duration    | int  |
+-------------+------+

There is no primary key for this table, it may contain duplicates.
Each row of this table contains the caller id, callee id and the duration of the call in minutes. caller_id != callee_id

A telecommunications company wants to invest in new countries. The company intends to invest in the countries where the average call duration of the calls in this country is strictly greater than the global average call duration.

Write an SQL query to find the countries where this company can invest.

Return the result table in any order.

The query result format is in the following example.

# person table:
+----+----------+--------------+
| id | name     | phone_number |
+----+----------+--------------+
| 3  | Jonathan | 051-1234567  |
| 12 | Elvis    | 051-7654321  |
| 1  | Moncef   | 212-1234567  |
| 2  | Maroua   | 212-6523651  |
| 7  | Meir     | 972-1234567  |
| 9  | Rachel   | 972-0011100  |
+----+----------+--------------+

# country table:
+----------+--------------+
| name     | country_code |
+----------+--------------+
| Peru     | 051          |
| Israel   | 972          |
| Morocco  | 212          |
| Germany  | 049          |
| Ethiopia | 251          |
+----------+--------------+

# calls table:
+-----------+-----------+----------+
| caller_id | callee_id | duration |
+-----------+-----------+----------+
| 1         | 9         | 33       |
| 2         | 9         | 4        |
| 1         | 2         | 59       |
| 3         | 12        | 102      |
| 3         | 12        | 330      |
| 12        | 3         | 5        |
| 7         | 9         | 13       |
| 7         | 1         | 3        |
| 9         | 7         | 1        |
| 1         | 7         | 7        |
+-----------+-----------+----------+

# Result table:
+----------+
| country  |
+----------+
| Peru     |
+----------+
Explaination:

The average call duration for Peru is (102 + 102 + 330 + 330 + 5 + 5) / 6 = 145.666667
The average call duration for Israel is (33 + 4 + 13 + 13 + 3 + 1 + 1 + 7) / 8 = 9.37500
The average call duration for Morocco is (33 + 4 + 59 + 59 + 3 + 7) / 6 = 27.5000 
Global call duration average = (2 * (33 + 3 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7)) / 20 = 55.70000
Since Peru is the only country where average call duration is greater than the global average, 
it's the only recommended country.

详解:

本题目的是找出所有国家平均 duration 大于全球平均 duration 的国家。

那么我们需要解决下面几个问题:

那么第一步其实是最复杂的,因为从原文解释中,我们知道不管是打电话还是接电话,我们都要计算其 duration,那么我们只需要用 or 来把 person 和 country 两个表格连接到一起;也就是在用 on 来连接时,既把 caller_id 也把 callee_id 和 person 中的 id 连接。

那么,同时我们也要把 country 连接进来,才能去找每个地区对应的平均 duration。这里我们用 left 函数来去字符串前 n 个字符,这里 n = 3。

with tmp as
(select c.name as 'country', duration
from person a
join calls b
on a.id = b.caller_id
or a.id = b.callee_id
join country c
on c.country_code = left(a.phone_number,3))

结果如下:

{"headers": ["country", "duration"], 
"values": 
[["Israel", 33], 
["Morocco", 33], 
["Israel", 4], 
["Morocco", 4], 
["Morocco", 59], 
["Morocco", 59], 
["Peru", 102], 
["Peru", 102], 
["Peru", 330], 
["Peru", 330], 
["Peru", 5], 
["Peru", 5], 
["Israel", 13], 
["Israel", 13], 
["Israel", 3], 
["Morocco", 3], 
["Israel", 1], 
["Israel", 1], 
["Israel", 7], 
["Morocco", 7]]}

那么接下来,我们只需要求每个地方的 duration 并判断是否大于全球平均的 duration 即可。

正确答案如下:

with tmp as
(select c.name as 'country', duration
from person a
join calls b
on a.id = b.caller_id
or a.id = b.callee_id
join country c
on c.country_code = left(a.phone_number,3))

select country
from tmp
group by 1
having avg(duration) > (select avg(duration) from calls)

13. Customers Who Bought All Products

Table: Customer

±------------±--------+
| Column Name | Type |
±------------±--------+
| customer_id | int |
| product_key | int |
±------------±--------+
product_key is a foreign key to Product table.

Table: Product

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_key | int     |
+-------------+---------+

product_key is the primary key column for this table.

Write an SQL query for a report that provides the customer ids from the Customer table that bought all the products in the Product table.

For example:

Customer table:

+-------------+-------------+
| customer_id | product_key |
+-------------+-------------+
| 1           | 5           |
| 2           | 6           |
| 3           | 5           |
| 3           | 6           |
| 1           | 6           |
+-------------+-------------+

Product table:

+-------------+
| product_key |
+-------------+
| 5           |
| 6           |
+-------------+

Result table:

+-------------+
| customer_id |
+-------------+
| 1           |
| 3           |
+-------------+

The customers who bought all the products (5 and 6) are customers with id 1 and 3.

详解:

本题比较简单,要我们找到购买所有产品的顾客,那么我们只需计算每个顾客购买的产品总数,看看是否等于 product 表里的产品总数。

正确答案如下:

select customer_id
from customer
group by customer_id
having count(distinct product_key) = (select count(*) from product)

14. Customers Who Bought Products A and B but Not C

Table: Customers

+---------------------+---------+
| Column Name         | Type    |
+---------------------+---------+
| customer_id         | int     |
| customer_name       | varchar |
+---------------------+---------+

customer_id is the primary key for this table.
customer_name is the name of the customer.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| customer_id   | int     |
| product_name  | varchar |
+---------------+---------+

order_id is the primary key for this table.

customer_id is the id of the customer who bought the product “product_name”.

Write an SQL query to report the customer_id and customer_name of customers who bought products “A”, “B” but did not buy the product “C” since we want to recommend them buy this product.

Return the result table ordered by customer_id.

The query result format is in the following example.

Customers table:
+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
| 1           | Daniel        |
| 2           | Diana         |
| 3           | Elizabeth     |
| 4           | Jhon          |
+-------------+---------------+

Orders table:

+------------+--------------+---------------+
| order_id   | customer_id  | product_name  |
+------------+--------------+---------------+
| 10         |     1        |     A         |
| 20         |     1        |     B         |
| 30         |     1        |     D         |
| 40         |     1        |     C         |
| 50         |     2        |     A         |
| 60         |     3        |     A         |
| 70         |     3        |     B         |
| 80         |     3        |     D         |
| 90         |     4        |     C         |
+------------+--------------+---------------+

Result table:

+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
| 3           | Elizabeth     |
+-------------+---------------+

Only the customer_id with id 3 bought the product A and B but not the product C.

详解:

我们要找只买 A 和 B 并没有买 C 的顾客,那么我们首先该计算每个顾客分别买 A、B、C 的个数

select
    customer_id,
    sum(if(product_name = 'A', 1,0)) as 'A',
    sum(if(product_name = 'B', 1,0)) as 'B',
    sum(if(product_name = 'C', 1,0)) as 'C'
from orders
group by 1

结果如下:

{"headers": ["customer_id", "A", "B", "C"], "values": 
[[1, 1, 1, 1], 
[2, 1, 0, 0], 
[3, 1, 1, 0], 
[4, 0, 0, 1]]}

那么从上面的结果来看,自然是 customer_id = 3时,满足条件,那么我们进行一下修改。

正确答案如下:

select * 
from customers 
where customer_id in
(select customer_id
from orders
group by customer_id
having sum(if(product_name = 'A', 1,0)) >= 1
and sum(if(product_name = 'B', 1,0)) >= 1
and sum(if(product_name = 'C', 1,0)) = 0)
# having筛选 A >= 1 and B >= 1 and c = 0

15. Department Highest Salary

The Employee table holds all employees. Every employee has an Id, a salary, and there is also a column for the department Id.

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

The Department table holds all departments of the company.

+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+

Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, your SQL query should return the following rows (order of rows does not matter).

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
+------------+----------+--------+

Explanation:

Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.

详解:

我们要找到每个 department 里面有最高 salary 的人。我们只需要用窗口函数来对每个 department 的人的工资进行降序排序并且找到序列等于 1 的人。

正确答案如下:

select Department, Employee, Salary from
(
	select 
		b.name as 'Department', 
		a.Name as 'Employee',
	    salary, 
	    dense_rank() over(partition by b.id order by salary desc) as rnk
	from employee a
	join department b
	on a.departmentid = b.id
) tmp
where rnk = 1

16. Evaluate Boolean Expression

Table Variables:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| name          | varchar |
| value         | int     |
+---------------+---------+

name is the primary key for this table.
This table contains the stored variables and their values.

Table Expressions:

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| left_operand  | varchar |
| operator      | enum    |
| right_operand | varchar |
+---------------+---------+

(left_operand, operator, right_operand) is the primary key for this table.

This table contains a boolean expression that should be evaluated.
operator is an enum that takes one of the values (’<’, ‘>’, ‘=’)

The values of left_operand and right_operand are guaranteed to be in the Variables table.

Write an SQL query to evaluate the boolean expressions in Expressions table.

Return the result table in any order.

The query result format is in the following example.

Variables table:

+------+-------+
| name | value |
+------+-------+
| x    | 66    |
| y    | 77    |
+------+-------+

Expressions table:

+--------------+----------+---------------+
| left_operand | operator | right_operand |
+--------------+----------+---------------+
| x            | >        | y             |
| x            | <        | y             |
| x            | =        | y             |
| y            | >        | x             |
| y            | <        | x             |
| x            | =        | x             |
+--------------+----------+---------------+

Result table:

+--------------+----------+---------------+-------+
| left_operand | operator | right_operand | value |
+--------------+----------+---------------+-------+
| x            | >        | y             | false |
| x            | <        | y             | true  |
| x            | =        | y             | false |
| y            | >        | x             | true  |
| y            | <        | x             | false |
| x            | =        | x             | true  |
+--------------+----------+---------------+-------+

As shown, you need find the value of each boolean exprssion in the table using the variables table.

详解:

我们要判断表格中的表示方法是否正确,也就是

正确答案如下:

select 
	e.*,
	case 
	    when operator = '=' and v1.value = v2.value then 'true'
	    when operator = '>' and v1.value > v2.value then 'true'
	    when operator = '<' and v1.value < v2.value then 'true'
		else 'false' End 'value'
from Expressions e 
join Variables v1
on e.left_operand = v1.name 
join Variables v2
on e.right_operand = v2.name

17. Exchange Seats

Mary is a teacher in a middle school and she has a table seat storing students’ names and their corresponding seat ids.

The column id is continuous increment.

Mary wants to change seats for the adjacent students.

Can you write a SQL query to output the result for Mary?

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

For the sample input, the output is:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

Note:
If the number of students is odd, there is no need to change the last one’s seat.

详解:

我们要把学生名称按照 id 的奇偶性来对调,那么有以下几种情况:

正确答案如下:

select
(case 
    when mod(id,2)!=0 and counts != id then id+1
    when mod(id,2)!=0 and counts = id then id
    else id-1
end) as id, student
from
    seat,
    (select count(*) as counts from seat) as seat_counts
order by id;

18. Find the Start and End Number of Continuous Ranges

Table: Logs

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| log_id        | int     |
+---------------+---------+

id is the primary key for this table.
Each row of this table contains the ID in a log Table.

Since some IDs have been removed from Logs. Write an SQL query to find the start and end number of continuous ranges in table Logs.

Order the result table by start_id.

The query result format is in the following example:

# Logs table:
+------------+
| log_id     |
+------------+
| 1          |
| 2          |
| 3          |
| 7          |
| 8          |
| 10         |
+------------+

# Result table:
+------------+--------------+
| start_id   | end_id       |
+------------+--------------+
| 1          | 3            |
| 7          | 8            |
| 10         | 10           |
+------------+--------------+

The result table should contain all ranges in table Logs.

From 1 to 3 is contained in the table.

From 4 to 6 is missing in the table

From 7 to 8 is contained in the table.

Number 9 is missing in the table.

Number 10 is contained in the table.

详解:

我们要找到每段连续数字的开头和结尾,那么可以先尝试把所有数字用窗口函数从小到大排列。

SELECT 
	log_id,
    dense_rank() OVER(order by log_id) AS rnk
FROM Logs 
{"headers": ["log_id", "rnk"], 
"values": 
[[1, 1], 
[2, 2], 
[3, 3], 
[7, 4], 
[8, 5], 
[10, 6]]}

那么我们发现这两列的差值好像有些地方相似,不放试一试。

SELECT 
	l.log_id,
    (l.log_id - l.rnk) AS diff
FROM
(
	SELECT 
		log_id,
        dense_rank() OVER(order by log_id) AS rnk
    FROM Logs   
) l
{"headers": ["log_id", "diff"], 
"values": 
[[1, 0], 
[2, 0],
[3, 0], 
[7, 3], 
[8, 3], 
[10, 4]]}

我们发现连续的数字对应的 diff 都是一样,那么我们可以对 diff 进行 group by,然后取最小值当做 start,最大值当做 end。

正确答案如下:

with log_diff as
(
    select l.log_id,
    (l.log_id - l.rnk) as diff
    from
    (
        select log_id,
        dense_rank() over(order by log_id) as rnk
        from Logs   
    ) l
)

select 
    min(log_id) as 'start_id',
    max(log_id) as 'end_id'
from log_diff
group by diff

19. Friend Requests II: Who Has the Most Friends

先来回顾一下 Friend Requests I: Overall Acceptance Rate

In social network like Facebook or Twitter, people send friend requests and accept others’ requests as well.

Table request_accepted

+--------------+-------------+------------+
| requester_id | accepter_id | accept_date|
|--------------|-------------|------------|
| 1            | 2           | 2016_06-03 |
| 1            | 3           | 2016-06-08 |
| 2            | 3           | 2016-06-08 |
| 3            | 4           | 2016-06-09 |
+--------------+-------------+------------+

This table holds the data of friend acceptance, while requester_id and accepter_id both are the id of a person.

Write a query to find the the people who has most friends and the most friends number under the following rules:

For the sample data above, the result is:

Result table:

+------+------+
| id   | num  |
|------|------|
| 3    | 3    |
+------+------+

The person with id ‘3’ is a friend of people ‘1’, ‘2’ and ‘4’, so he has 3 friends in total, which is the most number than any others.

Follow-up:
In the real world, multiple people could have the same most number of friends, can you find all these people in this case?

详解:

我们要找到有最多朋友的人,定义则是:

那么根据这个题的情况,id = 3 有最多的朋友,因为申请和接受是双向的,那么我们首先要搞清楚一共有多少个不同的人(不管是申请者还是接受者)

select requester_id as 'all' 
from request_accepted
union 
select accepter_id 
from request_accepted
{"headers": ["all"], 
"values": [[1], [2], [3], [4]]}

那么接下来,我们需要把上面的记录和 request_accepted 合并,保证上面的人要么是申请者要么是接受者。

with t1 as
(select requester_id as 'all' from request_accepted
union 
select accepter_id from request_accepted),

t2 as
(select c.all as 'id', count(*) as 'num'
from t1 c
left join request_accepted r
on c.all = r.requester_id
or c.all = r.accepter_id
group by 1
order by 1)

select * from t2

结果如下:

{"headers": ["id", "num"], "values": 
[[1, 2], [2, 2], [3, 3], [4, 1]]}

那么很容易可以看出,id = 3 是我们想要的,因为num 最大

正确答案如下:

with t1 as
(select requester_id as 'all' from request_accepted
union 
select accepter_id from request_accepted),

t2 as
(select c.all as 'id', count(*) as 'num'
from t1 c
left join request_accepted r
on c.all = r.requester_id
or c.all = r.accepter_id
group by 1
order by 1)

select * from t2
where num = (select max(num) from t2)

Follow-up:

在现实世界中,多个人可能拥有相同数量的朋友,在这种情况下,您可以找到所有这些人吗?

这种情况,我们需要用 window function 来把 num 进行排序,rank = 1 的就是我们想要的,所以稍微改一下即可。

with t1 as
(select requester_id as 'all' from request_accepted
union 
select accepter_id from request_accepted),

t2 as
(select c.all as 'id', count(*) as 'num', dense_rank() over(order by count(*) desc) as rnk
from t1 c
left join request_accepted r
on c.all = r.requester_id
or c.all = r.accepter_id
group by 1
order by 1)

select id,num
from t2
where rnk = 1

试验一下:

{"headers":{"request_accepted":["requester_id","accepter_id","accept_date"]},
"rows":
{"request_accepted":
[[1,2,"2016/06/03"],
[1,3,"2016/06/08"],
[2,3,"2016/06/08"],
[3,4,"2016/06/09"],
[4,1,"2016/06/09"]]}}

结果:

{"headers": ["id", "num"], 
"values": [[1, 3], [3, 3]]}

当然这个也适用于原题的情况,所以我们就把这个当做正确答案了。

20. Game Play Analysis III

先来回顾一下:

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+

(player_id, event_date) is the primary key of this table.

This table shows the activity of players of some game.

Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

Write an SQL query that reports for each player and date, how many games played so far by the player. That is, the total number of games played by the player until that date. Check the example for clarity.

The query result format is in the following example:

Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 1         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1         | 2016-03-01 | 5                   |
| 1         | 2016-05-02 | 11                  |
| 1         | 2017-06-25 | 12                  |
| 3         | 2016-03-02 | 0                   |
| 3         | 2018-07-03 | 5                   |
+-----------+------------+---------------------+

For the player with id 1, 5 + 6 = 11 games played by 2016-05-02, and 5 + 6 + 1 = 12 games played by 2017-06-25.

For the player with id 3, 0 + 5 = 5 games played by 2018-07-03.

Note that for each player we only care about the days when the player logged in.

详解:

我们要计算出截止到某一日期,每个玩家一共玩了多少游戏,也就是基于 event_date 计算所玩游戏的累积和。

那么我们可以使用 sum partition by 的语法来解决,sum 就是求和,partition by 意味着我们可以根据不同指标来分开求和,order by 就是按照某一指标来按照顺序求和。(这也是常用的 window function)

正确答案如下:

select
    player_id, 
    event_date, 
    sum(games_played) over (partition by player_id order by event_date) 
    as games_played_so_far
from activity 

21. Game Play Analysis IV

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+

(player_id, event_date) is the primary key of this table.

This table shows the activity of players of some game.

Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

Write an SQL query that reports the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.

The query result format is in the following example:

Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+

Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33

详解:

我们要求出在第一天登录后又在第二天登录的人占总人数的比例,那么我们要:

select 
	player_id, 
	event_date, 
	dense_rank() over(partition by player_id order by event_date) as date_rank
from Activity

结果如下:

{"headers": ["player_id", "event_date", "date_rank"], 
"values": 
[[1, "2016-03-01", 1], 
[1, "2016-03-02", 2], 
[2, "2017-06-25", 1], 
[3, "2016-03-02", 1], 
[3, "2018-07-03", 2]]}

接下来把上面的结果和 activity 合并,注意要用 left join,因为此时会出现 null 的情况。

正确答案如下:

with date_ranked as (
    select 
	    player_id, 
	    event_date, 
	    dense_rank() over(partition by player_id order by event_date) 
	    as date_rank
    from Activity
)

select
	round(count(checked.player_id)/count(distinct total.player_id), 2) 
	as fraction
from Activity as total
left join date_ranked as checked
on total.player_id = checked.player_id 
and checked.date_rank = 2 
and datediff(checked.event_date, total.event_date) = 1

22. Get Highest Answer Rate Question

Get the highest answer rate question from a table survey_log with these columns: id, action, question_id, answer_id, q_num, timestamp.

id means user id; action has these kind of values: “show”, “answer”, “skip”; answer_id is not null when action column is “answer”, while is null for “show” and “skip”; q_num is the numeral order of the question in current session.

Write a sql query to identify the question which has the highest answer rate.

Example:

Input:

+------+-----------+--------------+------------+-----------+------------+
| id   | action    | question_id  | answer_id  | q_num     | timestamp  |
+------+-----------+--------------+------------+-----------+------------+
| 5    | show      | 285          | null       | 1         | 123        |
| 5    | answer    | 285          | 124124     | 1         | 124        |
| 5    | show      | 369          | null       | 2         | 125        |
| 5    | skip      | 369          | null       | 2         | 126        |
+------+-----------+--------------+------------+-----------+------------+

Output:

+-------------+
| survey_log  |
+-------------+
|    285      |
+-------------+

Explanation:
question 285 has answer rate 1/1, while question 369 has 0/1 answer rate, so output 285.

Note: The highest answer rate meaning is: answer number’s ratio in show number in the same question.

详解:

我们需要找到解答率最高的题目,那么就要计算出每个题目的解答率,并降序排序,然后取第一个,比较简单。

正确答案如下:

SELECT question_id AS survey_log
FROM (
SELECT question_id, COUNT(answer_id)/ COUNT(question_id) as num
FROM survey_log
GROUP BY 1
ORDER BY num DESC
LIMIT 1) tmp

23. Highest Grade For Each Student

Table: Enrollments

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| course_id     | int     |
| grade         | int     |
+---------------+---------+

(student_id, course_id) is the primary key of this table.

Write a SQL query to find the highest grade with its corresponding course for each student. In case of a tie, you should find the course with the smallest course_id. The output must be sorted by increasing student_id.

The query result format is in the following example:

Enrollments table:

+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 2          | 2         | 95    |
| 2          | 3         | 95    |
| 1          | 1         | 90    |
| 1          | 2         | 99    |
| 3          | 1         | 80    |
| 3          | 2         | 75    |
| 3          | 3         | 82    |
+------------+-----------+-------+

Result table:

+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 1          | 2         | 99    |
| 2          | 2         | 95    |
| 3          | 3         | 82    |
+------------+-----------+-------+

详解:

我们要找每个课程里面的每个学生的最高成绩,如果持平,按照 course_id 排序,结果按照 student_id 排序。

这个题也比较简单,只需要针对每个学生的成绩进行排序即可,注意是按照成绩降序排列,并按照 course_id 升序排列,然后取出所有 rank = 1 的结果即可。

正确答案如下:

select 
    student_id, 
    course_id,
    grade
from
(select 
    student_id, 
    course_id,
    grade,
    dense_rank() over(partition by student_id order by grade desc, course_id asc) rnk
from enrollments) t
where rnk = 1

24. Get Highest Answer Rate Question

先来回顾一下 Immediate Food Delivery I

Table: Delivery

+-----------------------------+---------+
| Column Name                 | Type    |
+-----------------------------+---------+
| delivery_id                 | int     |
| customer_id                 | int     |
| order_date                  | date    |
| customer_pref_delivery_date | date    |
+-----------------------------+---------+

delivery_id is the primary key of this table.

The table holds information about food delivery to customers that make orders at some date and specify a preferred delivery date (on the same order date or after it).

If the preferred delivery date of the customer is the same as the order date then the order is called immediate otherwise it’s called scheduled.

The first order of a customer is the order with the earliest order date that customer made. It is guaranteed that a customer has exactly one first order.

Write an SQL query to find the percentage of immediate orders in the first orders of all customers, rounded to 2 decimal places.

The query result format is in the following example:

Delivery table:

+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1           | 1           | 2019-08-01 | 2019-08-02                  |
| 2           | 2           | 2019-08-02 | 2019-08-02                  |
| 3           | 1           | 2019-08-11 | 2019-08-12                  |
| 4           | 3           | 2019-08-24 | 2019-08-24                  |
| 5           | 3           | 2019-08-21 | 2019-08-22                  |
| 6           | 2           | 2019-08-11 | 2019-08-13                  |
| 7           | 4           | 2019-08-09 | 2019-08-09                  |
+-------------+-------------+------------+-----------------------------+

Result table:

+----------------------+
| immediate_percentage |
+----------------------+
| 50.00                |
+----------------------+

The customer id 1 has a first order with delivery id 1 and it is scheduled.
The customer id 2 has a first order with delivery id 2 and it is immediate.
The customer id 3 has a first order with delivery id 5 and it is scheduled.
The customer id 4 has a first order with delivery id 7 and it is immediate.
Hence, half the customers have immediate first orders.

详解:

我们要找到所有 customers 的 first orders 中的 immediate orders 所占比例,

那么 first orders 就是该人最早的单;immediate orders 就是 order_date = customer_pref_delivery_date 的情况。

首先我们要根据每个人,来确定他们的 first_order 所在日期,所以可以用 min partition by

select 
	distinct customer_id,
    min(order_date) over(partition by customer_id) as 'date'
from delivery

结果如下:

{"headers": ["customer_id", "date"], 
"values": 
[[1, "2019-08-01"], 
[2, "2019-08-02"], 
[3, "2019-08-21"], 
[4, "2019-08-09"]]}

接下来,要把上面的结果与 delivery 合并,保证 date = customer_pref_delivery_date

select *
from delivery b
left join
(select 
    distinct customer_id,
    min(order_date) over(partition by customer_id) as 'date'
from delivery) a
on a.customer_id = b.customer_id
and a.date = b.customer_pref_delivery_date

结果如下:

{"headers": ["delivery_id", "customer_id", "order_date", "customer_pref_delivery_date", "customer_id", "date"], 
"values": 
[[1, 1, "2019-08-01", "2019-08-02", null, null], 
[2, 2, "2019-08-02", "2019-08-02", 2, "2019-08-02"], 
[3, 1, "2019-08-11", "2019-08-12", null, null], 
[4, 3, "2019-08-24", "2019-08-24", null, null], 
[5, 3, "2019-08-21", "2019-08-22", null, null], 
[6, 2, "2019-08-11", "2019-08-13", null, null], 
[7, 4, "2019-08-09", "2019-08-09", 4, "2019-08-09"]]}

可以看出一共有两个满足条件的结果,那么稍加改动可计算出最后的百分比。

正确答案如下:

select 
	round(100*count(a.customer_id) / count(distinct b.customer_id),2) 
	as immediate_percentage
from 
(select 
    distinct customer_id,
    min(order_date) over(partition by customer_id) as 'date'
from delivery) a
right join delivery b
on a.customer_id = b.customer_id
and a.date = b.customer_pref_delivery_date

25. Investments in 2016

Write a query to print the sum of all total investment values in 2016 (TIV_2016), to a scale of 2 decimal places, for all policy holders who meet the following criteria:

  1. Have the same TIV_2015 value as one or more other policyholders.
  2. Are not located in the same city as any other policyholder (i.e.: the (latitude, longitude) attribute pairs must be unique).

Input Format:
The insurance table is described as follows:

Column Name Type
PID INTEGER(11)
TIV_2015 NUMERIC(15,2)
TIV_2016 NUMERIC(15,2)
LAT NUMERIC(5,2)
LON NUMERIC(5,2)

where PID is the policyholder’s policy ID, TIV_2015 is the total investment value in 2015, TIV_2016 is the total investment value in 2016, LAT is the latitude of the policy holder’s city, and LON is the longitude of the policy holder’s city.

Sample Input

PID TIV_2015 TIV_2016 LAT LON
1 10 5 10 10
2 20 20 20 20
3 10 30 20 20
4 10 40 40 40

Sample Output

TIV_2016
45.00

Explanation

The first record in the table, like the last record, meets both of the two criteria.
The TIV_2015 value ‘10’ is as the same as the third and forth record, and its location unique.

The second record does not meet any of the two criteria. Its TIV_2015 is not like any other policyholders.

And its location is the same with the third record, which makes the third record fail, too.

So, the result is the sum of TIV_2016 of the first and last record, which is 45.

详解:

我们要找到所有2016年的投资值的和(sum(TIV_2016))满足:

那么第一步要移除重复的位置,也就是保证每一个 (lat, lon) 对应唯一结果。

select * from(
	select *, count(lat) over(partition by lat, lon order by lat,lon) rn 
    from insurance
) a where rn=1

结果如下:

{"headers": ["PID", "TIV_2015", "TIV_2016", "LAT", "LON", "rn"], 
"values": 
[[1, 10.00, 5.00, 10.00, 10.00, 1], 
[4, 10.00, 40.00, 40.00, 40.00, 1]]}

第二步就是移除唯一的 TIV_2015 这一列

select * from(
	select *, count(tiv_2015) over(partition by tiv_2015) tv 
	from insurance
)b where tv>1

结果如下:

{"headers": ["PID", "TIV_2015", "TIV_2016", "LAT", "LON", "tv"], 
"values": 
[[1, 10.00, 5.00, 10.00, 10.00, 3], 
[3, 10.00, 30.00, 20.00, 20.00, 3], 
[4, 10.00, 40.00, 40.00, 40.00, 3]]}

接下来就要把这两种情况得到的结果合并到一起,要同时满足。

正确答案如下:

# Remove duplicate location entries
with temp1 as(
    select * from(
        select *, count(lat) over(partition by lat, lon order by lat,lon) rn 
        from insurance
    ) a where rn=1
    ),
# Remove TIV_2015 entries which are unique
temp2 as(
    select * from(
        select *, count(tiv_2015) over(partition by tiv_2015) tv 
        from insurance
    )b where tv>1
    )

select sum(temp1.tiv_2016) as TIV_2016 from temp1, temp2 where temp1.pid=temp2.pid

26. Last Person to Fit in the Elevator

Table: Queue

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| person_id   | int     |
| person_name | varchar |
| weight      | int     |
| turn        | int     |
+-------------+---------+

person_id is the primary key column for this table.
This table has the information about all people waiting for an elevator.
The person_id and turn columns will contain all numbers from 1 to n, where n is the number of rows in the table.

The maximum weight the elevator can hold is 1000.

Write an SQL query to find the person_name of the last person who will fit in the elevator without exceeding the weight limit. It is guaranteed that the person who is first in the queue can fit in the elevator.

The query result format is in the following example:

Queue table

+-----------+-------------------+--------+------+
| person_id | person_name       | weight | turn |
+-----------+-------------------+--------+------+
| 5         | George Washington | 250    | 1    |
| 3         | John Adams        | 350    | 2    |
| 6         | Thomas Jefferson  | 400    | 3    |
| 2         | Will Johnliams    | 200    | 4    |
| 4         | Thomas Jefferson  | 175    | 5    |
| 1         | James Elephant    | 500    | 6    |
+-----------+-------------------+--------+------+

Result table

+-------------------+
| person_name       |
+-------------------+
| Thomas Jefferson  |
+-------------------+

Queue table is ordered by turn in the example for simplicity.
In the example George Washington(id 5), John Adams(id 3) and Thomas Jefferson(id 6) will enter the elevator as their weight sum is 250 + 350 + 400 = 1000.
Thomas Jefferson(id 6) is the last person to fit in the elevator because he has the last turn in these three people.

详解:

这个题跟之前的一道题很像,都是要求累积和,这里根据 turn 来求 weight 的累积和,然后找到累积和小于或等于 1000 的情况,按 turn 的降序排列,取第一个即可,由于 turn 是唯一的,我们不需要考虑持平的可能,所以不需要用 dense_rank()。

先来计算累积和:

select
    turn,
    person_name,
    sum(weight) over(order by turn) as 'cumulative_sum'
from queue

结果如下:

{"headers": ["turn", "person_name", "cumulative_sum"], 
"values": 
[[1, "George Washington", 250], 
[2, "John Adams", 600], 
[3, "Thomas Jefferson", 1000], 
[4, "Will Johnliams", 1200], 
[5, "Thomas Jefferson", 1375], 
[6, "James Elephant", 1875]]}

并限制 cumulative_sum 小于等于 1000,按 turn 的降序排列,取第一个即可。

正确答案如下:

select person_name
from
(select
    turn,
    person_name,
    sum(weight) over(order by turn) as 'cumulative_sum'
from queue) t1
where cumulative_sum <= 1000
order by turn desc
limit 1

27. Managers with at Least 5 Direct Reports

The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.

+------+----------+-----------+----------+
|Id    |Name 	  |Department |ManagerId |
+------+----------+-----------+----------+
|101   |John 	  |A 	      |null      |
|102   |Dan 	  |A 	      |101       |
|103   |James 	  |A 	      |101       |
|104   |Amy 	  |A 	      |101       |
|105   |Anne 	  |A 	      |101       |
|106   |Ron 	  |B 	      |101       |
+------+----------+-----------+----------+

Given the Employee table, write a SQL query that finds out managers with at least 5 direct report. For the above table, your SQL query should return:

+-------+
| Name  |
+-------+
| John  |
+-------+

Note:
No one would report to himself.

详解:

我们要找到拥有至少 5 个直接报告的 manager,那么我们需要 self join 这个表格,并且查看 id = managerid 的情况

select *
from employee a, employee b
where b.id = a.managerid
{"headers": ["Id", "Name", "Department", "ManagerId", "Id", "Name", "Department", "ManagerId"], 
"values": 
[[106, "Ron", "B", 101, 101, "John", "A", null], 
[105, "Anne", "A", 101, 101, "John", "A", null], 
[104, "Amy", "A", 101, 101, "John", "A", null], 
[103, "James", "A", 101, 101, "John", "A", null], 
[102, "Dan", "A", 101, 101, "John", "A", null]]}

那么 id = 101 有 5 个直接报告,所以稍微改一下

正确答案如下:

select b.name
from employee a, employee b
where b.id = a.managerid
group by 1
having count(b.name) >= 5

28. Market Analysis I

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| join_date      | date    |
| favorite_brand | varchar |
+----------------+---------+

user_id is the primary key of this table.
This table has the info of the users of an online shopping website where users can sell and buy items.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| buyer_id      | int     |
| seller_id     | int     |
+---------------+---------+

order_id is the primary key of this table.
item_id is a foreign key to the Items table.
buyer_id and seller_id are foreign keys to the Users table.

Table: Items

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| item_id       | int     |
| item_brand    | varchar |
+---------------+---------+

item_id is the primary key of this table.

Write an SQL query to find for each user, the join date and the number of orders they made as a buyer in 2019.

The query result format is in the following example:

Users table:

+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2018-01-01 | Lenovo         |
| 2       | 2018-02-09 | Samsung        |
| 3       | 2018-01-19 | LG             |
| 4       | 2018-05-21 | HP             |
+---------+------------+----------------+

Orders table:

+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2018-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2018-08-04 | 1       | 4        | 2         |
| 5        | 2018-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+

Items table:

+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+

Result table:

+-----------+------------+----------------+
| buyer_id  | join_date  | orders_in_2019 |
+-----------+------------+----------------+
| 1         | 2018-01-01 | 1              |
| 2         | 2018-02-09 | 2              |
| 3         | 2018-01-19 | 0              |
| 4         | 2018-05-21 | 0              |
+-----------+------------+----------------+

详解:

我们要找到每个用户的 join date、在2019年下的单数,题目很简单,在三个表格之中进行合并,并限制日期在 2019 年即可。

正确答案如下:

select a.user_id buyer_id, join_date,count(order_id) as orders_in_2019
from users a
left join 
(select * from orders where order_date between '2019-01-01' and '2019-12-31') tmp
on a.user_id = tmp.buyer_id
group by a.user_id

29. Monthly Transactions I

Table: Transactions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| country       | varchar |
| state         | enum    |
| amount        | int     |
| trans_date    | date    |
+---------------+---------+

id is the primary key of this table.
The table has information about incoming transactions.
The state column is an enum of type [“approved”, “declined”].

Write an SQL query to find for each month and country, the number of transactions and their total amount, the number of approved transactions and their total amount.

The query result format is in the following example:

# transactions table:
+------+---------+----------+--------+------------+
| id   | country | state    | amount | trans_date |
+------+---------+----------+--------+------------+
| 121  | US      | approved | 1000   | 2018-12-18 |
| 122  | US      | declined | 2000   | 2018-12-19 |
| 123  | US      | approved | 2000   | 2019-01-01 |
| 124  | DE      | approved | 2000   | 2019-01-07 |
+------+---------+----------+--------+------------+

# result table:
+----------+---------+-------------+----------------+--------------------+-----------------------+
| month    | country | trans_count | approved_count | trans_total_amount | approved_total_amount |
+----------+---------+-------------+----------------+--------------------+-----------------------+
| 2018-12  | US      | 2           | 1              | 3000               | 1000                  |
| 2019-01  | US      | 1           | 1              | 2000               | 2000                  |
| 2019-01  | DE      | 1           | 1              | 2000               | 2000                  |
+----------+---------+-------------+----------------+--------------------+-----------------------+

详解:

我们要找到对于每个月每个过年的交易数量和交易总量,通过的交易数和总量,那么自然先计算总共的,再根据条件筛选 approved 的情况。

正确答案如下:

select
    left(trans_date,7) as 'month',
    # left 取字符串前 n 个字符
    country,
    count(id) as trans_count,
    sum(if(state = 'approved',1,0)) as approved_count,
    sum(amount) as trans_total_amount,
    sum(if(state = 'approved',amount,0)) as approved_total_amount 
    # if 限定 approve 条件
from transactions
group by 1,2

30. Monthly Transactions II

Table: Transactions

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| id             | int     |
| country        | varchar |
| state          | enum    |
| amount         | int     |
| trans_date     | date    |
+----------------+---------+

id is the primary key of this table.
The table has information about incoming transactions.
The state column is an enum of type [“approved”, “declined”].

Table: Chargebacks

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| trans_id       | int     |
| charge_date    | date    |
+----------------+---------+

Chargebacks contains basic information regarding incoming chargebacks from some transactions placed in Transactions table.
trans_id is a foreign key to the id column of Transactions table.
Each chargeback corresponds to a transaction made previously even if they were not approved.

Write an SQL query to find for each month and country, the number of approved transactions and their total amount, the number of chargebacks and their total amount.

Note: In your query, given the month and country, ignore rows with all zeros.

The query result format is in the following example:

# transactions table:
+------+---------+----------+--------+------------+
| id   | country | state    | amount | trans_date |
+------+---------+----------+--------+------------+
| 101  | US      | approved | 1000   | 2019-05-18 |
| 102  | US      | declined | 2000   | 2019-05-19 |
| 103  | US      | approved | 3000   | 2019-06-10 |
| 104  | US      | approved | 4000   | 2019-06-13 |
| 105  | US      | approved | 5000   | 2019-06-15 |
+------+---------+----------+--------+------------+

# chargebacks table:
+------------+------------+
| trans_id   | trans_date |
+------------+------------+
| 102        | 2019-05-29 |
| 101        | 2019-06-30 |
| 105        | 2019-09-18 |
+------------+------------+

# result table:
+----------+---------+----------------+-----------------+-------------------+--------------------+
| month    | country | approved_count | approved_amount | chargeback_count  | chargeback_amount  |
+----------+---------+----------------+-----------------+-------------------+--------------------+
| 2019-05  | US      | 1              | 1000            | 1                 | 2000               |
| 2019-06  | US      | 3              | 12000           | 1                 | 1000               |
| 2019-09  | US      | 0              | 0               | 1                 | 5000               |
+----------+---------+----------------+-----------------+-------------------+--------------------+

详解:

跟上一题类似,但是我们要分别找到对应状态为 approved 和 chargeback 的所有信息,那么当满足 chargeback table 条件的情况,我们设置 state 为chargeback,再 union 状态为 approved 的情况

WITH M AS (
SELECT 
	C.trans_id AS 'id', 
	T.country, 
	T.amount, 
	C.trans_date, 
	'chargeback' AS 'state'
FROM Chargebacks C JOIN Transactions T
ON T.ID = C.trans_id
# 满足 chargeback 的 trans_id 都对应着新的 state = chargeback

UNION

SELECT id, country, amount, trans_date, state
FROM Transactions
WHERE state = 'approved')

结果如下:

{"headers": ["id", "country", "amount", "trans_date", "state"], 
"values": 
[[101, "US", 1000, "2019-06-30", "chargeback"], 
[102, "US", 2000, "2019-05-29", "chargeback"], 
[105, "US", 5000, "2019-09-18", "chargeback"], 
[101, "US", 1000, "2019-05-18", "approved"],
[103, "US", 3000, "2019-06-10", "approved"], 
[105, "US", 5000, "2019-06-15", "approved"]]}

那么根据不同的 state,我们把对应的情况一次查找出来,使用 case when 函数或者 if 函数都可以

SELECT 
	left(trans_id,7) AS 'month', country, 
	SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS 'approved_count',
	SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS 'approved_amount',
	SUM(CASE WHEN state = 'chargeback' THEN 1 ELSE 0 END) AS 'chargeback_count',
	SUM(CASE WHEN state = 'chargeback' THEN amount ELSE 0 END) AS 'chargeback_amount'
	# 当然也可用 if
FROM M
GROUP BY 1,2;

正确答案如下:

WITH M AS (
SELECT C.trans_id AS 'id', T.country, T.amount, C.trans_date, 'chargeback' AS 'state'
FROM Chargebacks C JOIN Transactions T
ON T.ID = C.trans_id

UNION

SELECT id, country, amount, trans_date, state
FROM Transactions
WHERE state = 'approved')

SELECT 
	left(trans_date,7) AS 'month', country, 
	SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS 'approved_count',
	SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS 'approved_amount',
	SUM(CASE WHEN state = 'chargeback' THEN 1 ELSE 0 END) AS 'chargeback_count',
	SUM(CASE WHEN state = 'chargeback' THEN amount ELSE 0 END) AS 'chargeback_amount'
	# 当然也可用 if
FROM M
GROUP BY 1,2;

31. Movie Rating

Table: Movies

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| title         | varchar |
+---------------+---------+

movie_id is the primary key for this table.
title is the name of the movie.

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
+---------------+---------+

user_id is the primary key for this table.

Table: Movie_Rating

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| user_id       | int     |
| rating        | int     |
| created_at    | date    |
+---------------+---------+

(movie_id, user_id) is the primary key for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user’s review date.

Write the following SQL query:

The query is returned in 2 rows, the query result format is in the following example:

Movies table:

+-------------+--------------+
| movie_id    |  title       |
+-------------+--------------+
| 1           | Avengers     |
| 2           | Frozen 2     |
| 3           | Joker        |
+-------------+--------------+

Users table:

+-------------+--------------+
| user_id     |  name        |
+-------------+--------------+
| 1           | Daniel       |
| 2           | Monica       |
| 3           | Maria        |
| 4           | James        |
+-------------+--------------+

Movie_Rating table:

+-------------+--------------+--------------+-------------+
| movie_id    | user_id      | rating       | created_at  |
+-------------+--------------+--------------+-------------+
| 1           | 1            | 3            | 2020-01-12  |
| 1           | 2            | 4            | 2020-02-11  |
| 1           | 3            | 2            | 2020-02-12  |
| 1           | 4            | 1            | 2020-01-01  |
| 2           | 1            | 5            | 2020-02-17  | 
| 2           | 2            | 2            | 2020-02-01  | 
| 2           | 3            | 2            | 2020-03-01  |
| 3           | 1            | 3            | 2020-02-22  | 
| 3           | 2            | 4            | 2020-02-25  | 
+-------------+--------------+--------------+-------------+

Result table:

+--------------+
| results      |
+--------------+
| Daniel       |
| Frozen 2     |
+--------------+

Daniel and Monica have rated 3 movies (“Avengers”, “Frozen 2” and “Joker”) but Daniel is smaller lexicographically.
Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.

详解:

我们把 created_at 转换成年月,并把三个表格合并

with cte as 
(select 
	title, 
	name, 
	rating, 
	left(created_at,7) as 'created_at'
from movie_rating
join users
using (user_id)
join movies 
using (movie_id))

结果如下:

{"headers": ["title", "name", "rating", "created_at"], 
"values": 
[["Joker", "Daniel", 3, "2020-02"], 
["Frozen 2", "Daniel", 5, "2020-02"], 
["Avengers", "Daniel", 3, "2020-01"], 
["Joker", "Monica", 4, "2020-02"], 
["Frozen 2", "Monica", 2, "2020-02"], 
["Avengers", "Monica", 4, "2020-02"], 
["Frozen 2", "Maria", 2, "2020-03"], 
["Avengers", "Maria", 2, "2020-02"], 
["Avengers", "James", 1, "2020-01"]]}

那么接下来找给电影打分数量最多的用户,和在 2020 年 2 月有最高平均分的电影名,我们只需分别找出再 union 到一起

正确答案如下:

with cte as 
(select title, name, rating, left(created_at,7) as 'created_at'
from movie_rating
join users
using (user_id)
join movies 
using (movie_id))

(select name as results from
(select
    name, row_number() over(order by count(title) desc, name) as rnk
from cte
group by name) t1
where rnk = 1)
union all
(select title from
(select 
    title, row_number() over(order by avg(rating) desc, title) as rnk
from cte
where created_at = '2020-02'
group by title) t2
where rnk = 1)

32. New Users Daily Count

Table: Traffic

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| activity      | enum    |
| activity_date | date    |
+---------------+---------+

There is no primary key for this table, it may have duplicate rows.
The activity column is an ENUM type of (‘login’, ‘logout’, ‘jobs’, ‘groups’, ‘homepage’).

Write an SQL query that reports for every date within at most 90 days from today, the number of users that logged in for the first time on that date. Assume today is 2019-06-30.

The query result format is in the following example:

Traffic table:

+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1       | login    | 2019-05-01    |
| 1       | homepage | 2019-05-01    |
| 1       | logout   | 2019-05-01    |
| 2       | login    | 2019-06-21    |
| 2       | logout   | 2019-06-21    |
| 3       | login    | 2019-01-01    |
| 3       | jobs     | 2019-01-01    |
| 3       | logout   | 2019-01-01    |
| 4       | login    | 2019-06-21    |
| 4       | groups   | 2019-06-21    |
| 4       | logout   | 2019-06-21    |
| 5       | login    | 2019-03-01    |
| 5       | logout   | 2019-03-01    |
| 5       | login    | 2019-06-21    |
| 5       | logout   | 2019-06-21    |
+---------+----------+---------------+

Result table:

+------------+-------------+
| login_date | user_count  |
+------------+-------------+
| 2019-05-01 | 1           |
| 2019-06-21 | 2           |
+------------+-------------+

Note that we only care about dates with non zero user count.
The user with id 5 first logged in on 2019-03-01 so he’s not counted on 2019-06-21.

详解:

假设今天是 2019-06-30,我们要找到从今天开始往前推最多 90 天内,那一天第一次等于的用户人数。

首先我们先找到每个用户登录的最早日期

with t as 
(
  select 
    user_id,min(activity_date) as first_date
  from Traffic 
  where activity='login'
  group by 1
)

结果如下:

{"headers": ["user_id", "first_date"], 
"values": 
[[1, "2019-05-01"], 
[2, "2019-06-21"], 
[3, "2019-01-01"], 
[4, "2019-06-21"], 
[5, "2019-03-01"]]}

那么在这个基础上,我们筛选 login_date 在从今天开始算的之前 90 以内,第一次登录的人数。

正确答案如下:

with t as 
(
  select 
    user_id,min(activity_date) as first_date
  from Traffic 
  where activity='login'
  group by 1
)
select 
  first_date as login_date
  ,count(distinct user_id) as user_count
from t
where first_date >= date_sub('2019-06-30',interval 90 day)
# date_sub 把当前日期按照一定的 interval 相减
group by 1
order by 1;

33. Nth Highest Salary

Write a SQL query to get the _n_th highest salary from the Employee table.

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

For example, given the above Employee table, the _n_th highest salary where n = 2 is 200. If there is no _n_th highest salary, then the query should return null.

+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| 200                    |
+------------------------+

详解:

我们只需要用 window fuction 来找到第 n 高的薪资就可以了。

正确答案如下:

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  RETURN (
      # Write your MySQL query statement below.
      select
      max(salary) as getNthHighestSalary
      from
      (select salary, dense_rank() over (order by salary desc) as rnk from employee) t
      where rnk = N
  );
END

34. Number of Trusted Contacts of a Customer

Table: Customers

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| customer_name | varchar |
| email         | varchar |
+---------------+---------+

customer_id is the primary key for this table.
Each row of this table contains the name and the email of a customer of an online shop.

Table: Contacts

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | id      |
| contact_name  | varchar |
| contact_email | varchar |
+---------------+---------+

(user_id, contact_email) is the primary key for this table.
Each row of this table contains the name and email of one contact of customer with user_id.
This table contains information about people each customer trust. The contact may or may not exist in the Customers table.

Table: Invoices

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| invoice_id   | int     |
| price        | int     |
| user_id      | int     |
+--------------+---------+

invoice_id is the primary key for this table.
Each row of this table indicates that user_id has an invoice with invoice_id and a price.

Write an SQL query to find the following for each invoice_id:

Order the result table by invoice_id.

The query result format is in the following example:

# customers table:
+-------------+---------------+--------------------+
| customer_id | customer_name | email              |
+-------------+---------------+--------------------+
| 1           | Alice         | alice@leetcode.com |
| 2           | Bob           | bob@leetcode.com   |
| 13          | John          | john@leetcode.com  |
| 6           | Alex          | alex@leetcode.com  |
+-------------+---------------+--------------------+

# contacts table:
+-------------+--------------+--------------------+
| user_id     | contact_name | contact_email      |
+-------------+--------------+--------------------+
| 1           | Bob          | bob@leetcode.com   |
| 1           | John         | john@leetcode.com  |
| 1           | Jal          | jal@leetcode.com   |
| 2           | Omar         | omar@leetcode.com  |
| 2           | Meir         | meir@leetcode.com  |
| 6           | Alice        | alice@leetcode.com |
+-------------+--------------+--------------------+

# invoices table:
+------------+-------+---------+
| invoice_id | price | user_id |
+------------+-------+---------+
| 77         | 100   | 1       |
| 88         | 200   | 1       |
| 99         | 300   | 2       |
| 66         | 400   | 2       |
| 55         | 500   | 13      |
| 44         | 60    | 6       |
+------------+-------+---------+

# result table:
+------------+---------------+-------+--------------+----------------------+
| invoice_id | customer_name | price | contacts_cnt | trusted_contacts_cnt |
+------------+---------------+-------+--------------+----------------------+
| 44         | Alex          | 60    | 1            | 1                    |
| 55         | John          | 500   | 0            | 0                    |
| 66         | Bob           | 400   | 2            | 0                    |
| 77         | Alice         | 100   | 3            | 2                    |
| 88         | Alice         | 200   | 3            | 2                    |
| 99         | Bob           | 300   | 2            | 0                    |
+------------+---------------+-------+--------------+----------------------+

Alice has three contacts, two of them are trusted contacts (Bob and John).
Bob has two contacts, none of them is a trusted contact.
Alex has one contact and it is a trusted contact (Alice).
John doesn’t have any contacts.

详解:

本题踩比赞多,因为不光题看着复杂,实际上难度也基本没有,归在简单类型里也是无可厚非。

我们只需把这几个表格合并起来,找到对应的列取出来,然后 group by + count即可,具体解释写在代码注释里。

正确答案如下:

select 
    i.invoice_id,
    c1.customer_name,
    max(i.price) as price,
    count(co.user_id) as contacts_cnt,
    count(c2.customer_id) as trusted_contacts_cnt 
from invoices i
join customers c1
# 里面无缺失值可用 join
on i.user_id = c1.customer_id
left join contacts co
# contacts 里的 id = 13 不存在
on co.user_id = c1.customer_id
left join customers c2
# 再次 join customers 保证 contact_name = customer_name
on c2.customer_name = co.contact_name
group by 1
order by 1

35. Page Recommendations

Table: Friendship

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user1_id      | int     |
| user2_id      | int     |
+---------------+---------+

(user1_id, user2_id) is the primary key for this table.
Each row of this table indicates that there is a friendship relation between user1_id and user2_id.

Table: Likes

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| page_id     | int     |
+-------------+---------+

(user_id, page_id) is the primary key for this table.
Each row of this table indicates that user_id likes page_id.

Write an SQL query to recommend pages to the user with user_id = 1 using the pages that your friends liked. It should not recommend pages you already liked.

Return result table in any order without duplicates.

The query result format is in the following example:

Friendship table:

+----------+----------+
| user1_id | user2_id |
+----------+----------+
| 1        | 2        |
| 1        | 3        |
| 1        | 4        |
| 2        | 3        |
| 2        | 4        |
| 2        | 5        |
| 6        | 1        |
+----------+----------+

Likes table:

+---------+---------+
| user_id | page_id |
+---------+---------+
| 1       | 88      |
| 2       | 23      |
| 3       | 24      |
| 4       | 56      |
| 5       | 11      |
| 6       | 33      |
| 2       | 77      |
| 3       | 77      |
| 6       | 88      |
+---------+---------+

Result table:

+------------------+
| recommended_page |
+------------------+
| 23               |
| 24               |
| 56               |
| 33               |
| 77               |
+------------------+

User one is friend with users 2, 3, 4 and 6.
Suggested pages are 23 from user 2, 24 from user 3, 56 from user 3 and 33 from user 6.
Page 77 is suggested from both user 2 and user 3.
Page 88 is not suggested because user 1 already likes it.

详解:

我们需要找到 user_id = 1 的朋友喜欢的 pages,并排除他自己本身喜欢的 pages。

那么,我们要先找出跟 id = 1 的朋友关系信息,并查找朋友对应的 pages。

SELECT *
FROM Friendship f
JOIN Likes l
ON (f.user1_id=1 and f.user2_id=l.user_id) 
OR (f.user2_id=1 and f.user1_id=l.user_id)

结果如下:

{"headers": ["user1_id", "user2_id", "user_id", "page_id"], 
"values": 
[[1, 2, 2, 23], 
[1, 3, 3, 24], 
[1, 4, 4, 56], 
[6, 1, 6, 33], 
[1, 2, 2, 77], 
[1, 3, 3, 77], 
[6, 1, 6, 88]]}

接下来要排除 id = 1 喜欢的 pages,那么继续 left join likes table

SELECT *
FROM Friendship f
JOIN Likes l
ON (f.user1_id=1 and f.user2_id=l.user_id) 
OR (f.user2_id=1 and f.user1_id=l.user_id)
LEFT JOIN Likes s
ON s.user_id=1 and l.page_id=s.page_id
{"headers": ["user1_id", "user2_id", "user_id", "page_id", "user_id", "page_id"], 
"values": 
[[1, 2, 2, 23, null, null], 
[1, 3, 3, 24, null, null], 
[1, 4, 4, 56, null, null], 
[6, 1, 6, 33, null, null], 
[1, 2, 2, 77, null, null], 
[1, 3, 3, 77, null, null], 
[6, 1, 6, 88, 1, 88]]}
# 可以看到最后一行代表的是存在朋友推荐和id = 1的pages一样
# 所以接下来要把这一行去掉就可以

正确答案如下:

SELECT distinct l.page_id recommended_page
FROM Friendship f
JOIN Likes l
ON (f.user1_id=1 and f.user2_id=l.user_id) 
OR (f.user2_id=1 and f.user1_id=l.user_id)
LEFT JOIN Likes s
ON s.user_id=1 and l.page_id=s.page_id
WHERE s.page_id is null

36. Product Price at a Given Date

Table: Products

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| new_price     | int     |
| change_date   | date    |
+---------------+---------+

(product_id, change_date) is the primary key of this table.
Each row of this table indicates that the price of some product was changed to a new price at some date.

Write an SQL query to find the prices of all products on 2019-08-16. Assume the price of all products before any change is 10.

The query result format is in the following example:

# products table:
+------------+-----------+-------------+
| product_id | new_price | change_date |
+------------+-----------+-------------+
| 1          | 20        | 2019-08-14  |
| 2          | 50        | 2019-08-14  |
| 1          | 30        | 2019-08-15  |
| 1          | 35        | 2019-08-16  |
| 2          | 65        | 2019-08-17  |
| 3          | 20        | 2019-08-18  |
+------------+-----------+-------------+

# result table:
+------------+-------+
| product_id | price |
+------------+-------+
| 2          | 50    |
| 1          | 35    |
| 3          | 10    |
+------------+-------+

详解:

select 
	t.product_id,
	coalesce(p.new_price,10) as 		price,
	p.change_date,
	row_number() over(partition by t.product_id order by p.change_date desc) as rnk
from
(select distinct product_id
from Products)t
left join
products p 
on t.product_id = p.product_id
and p.change_date <= '2019-08-16'
{"headers": ["product_id", "price", "change_date", "rnk"], 
"values": 
[[1, 35, "2019-08-16", 1], 
[1, 30, "2019-08-15", 2], 
[1, 20, "2019-08-14", 3], 
[2, 50, "2019-08-14", 1], 
[3, 10, null, 1]]}

那么我们只需取出 rnk = 1 的信息即可。

正确答案如下:

select product_id, price from
(select t.product_id,coalesce(p.new_price,10) as price,p.change_date,
row_number() over(partition by t.product_id order by p.change_date desc) as rnk
from
(select distinct product_id
from Products)t
left join
products p 
on t.product_id = p.product_id
and p.change_date<='2019-08-16') tmp
where rnk = 1

37. Project Employees III

如需要查阅 Project Employees II ,请点击这里

Table: Project

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| project_id  | int     |
| employee_id | int     |
+-------------+---------+

(project_id, employee_id) is the primary key of this table.
employee_id is a foreign key to Employee table.

Table: Employee

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| employee_id      | int     |
| name             | varchar |
| experience_years | int     |
+------------------+---------+

employee_id is the primary key of this table.

Write an SQL query that reports the most experienced employees in each project. In case of a tie, report all employees with the maximum number of experience years.

The query result format is in the following example:

Project table:

+-------------+-------------+
| project_id  | employee_id |
+-------------+-------------+
| 1           | 1           |
| 1           | 2           |
| 1           | 3           |
| 2           | 1           |
| 2           | 4           |
+-------------+-------------+

Employee table:

+-------------+--------+------------------+
| employee_id | name   | experience_years |
+-------------+--------+------------------+
| 1           | Khaled | 3                |
| 2           | Ali    | 2                |
| 3           | John   | 3                |
| 4           | Doe    | 2                |
+-------------+--------+------------------+

Result table:

+-------------+---------------+
| project_id  | employee_id   |
+-------------+---------------+
| 1           | 1             |
| 1           | 3             |
| 2           | 1             |
+-------------+---------------+

Both employees with id 1 and 3 have the most experience among the employees of the first project. For the second project, the employee with id 1 has the most experience.

详解:

我们要找到每个 project 里 experience_years 最多的 employees,那么只需用 window function partition by 即可。

正确答案如下:

with cte as 
(select 
    project_id, 
    p.employee_id, 
    experience_years,
    dense_rank() over(partition by project_id order by experience_years desc) rnk
from project p
join employee e
using (employee_id))

select project_id, employee_id
from cte 
where rnk = 1

38. Rank Scores

Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no “holes” between ranks.

+----+-------+
| Id | Score |
+----+-------+
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |
| 6  | 3.65  |
+----+-------+

For example, given the above Scores table, your query should generate the following report (order by highest score):

+-------+---------+
| score | Rank    |
+-------+---------+
| 4.00  | 1       |
| 4.00  | 1       |
| 3.85  | 2       |
| 3.65  | 3       |
| 3.65  | 3       |
| 3.50  | 4       |
+-------+---------+

Important Note: For MySQL solutions, to escape reserved words used as column names, you can use an apostrophe before and after the keyword. For example Rank.

详解:

本题是非常 popular 的一道题,虽然简单,但是也告诉我们 window function 的重要性!

正确答案如下:

select 
	score, 
	dense_rank() over (order by score desc) 
	as `Rank`
from scores

39. Rectangles Area

Table: Points

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| x_value       | int     |
| y_value       | int     |
+---------------+---------+

id is the primary key for this table.
Each point is represented as a 2D Dimensional (x_value, y_value).

Write an SQL query to report of all possible rectangles which can be formed by any two points of the table.

Each row in the result contains three columns (p1, p2, area) where:

Report the query in descending order by area in case of tie in ascending order by p1 and p2.

Points table:

+----------+-------------+-------------+
| id       | x_value     | y_value     |
+----------+-------------+-------------+
| 1        | 2           | 8           |
| 2        | 4           | 7           |
| 3        | 2           | 10          |
+----------+-------------+-------------+

Result table:

+----------+-------------+-------------+
| p1       | p2          | area        |
+----------+-------------+-------------+
| 2        | 3           | 6           |
| 1        | 2           | 2           |
+----------+-------------+-------------+

p1 should be less than p2 and area greater than 0.
p1 = 1 and p2 = 2, has an area equal to |2-4| * |8-7| = 2.
p1 = 2 and p2 = 3, has an area equal to |4-2| * |7-10| = 6.
p1 = 1 and p2 = 3 It’s not possible because the rectangle has an area equal to 0.

详解:

给定了角的坐标,我们来求长方形面积,我们知道矩形面积等于

|横坐标之差| * |纵坐标之差|

所以我们可以使用 self join 来判断是否可以形成矩形:

正确答案如下:

select 
    a.id as p1,
    b.id as p2,
    abs(a.x_value-b.x_value) * abs(a.y_value-b.y_value) as area
from points a, points b
where a.id <> b.id
and a.id < b.id  # 防止重复
and abs(a.x_value-b.x_value) * abs(a.y_value-b.y_value) > 0
order by area desc, p1,p2

40. Restaurant Growth

Table: Customer

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| name          | varchar |
| visited_on    | date    |
| amount        | int     |
+---------------+---------+

(customer_id, visited_on) is the primary key for this table.
This table contains data about customer transactions in a restaurant.
visited_on is the date on which the customer with ID (customer_id) have visited the restaurant.
amount is the total paid by a customer.

You are the restaurant owner and you want to analyze a possible expansion (there will be at least one customer every day).

Write an SQL query to compute moving average of how much customer paid in a 7 days window (current day + 6 days before) .

The query result format is in the following example:

Return result table ordered by visited_on.

average_amount should be rounded to 2 decimal places, all dates are in the format (‘YYYY-MM-DD’).

Customer table:

+-------------+--------------+--------------+-------------+
| customer_id | name         | visited_on   | amount      |
+-------------+--------------+--------------+-------------+
| 1           | Jhon         | 2019-01-01   | 100         |
| 2           | Daniel       | 2019-01-02   | 110         |
| 3           | Jade         | 2019-01-03   | 120         |
| 4           | Khaled       | 2019-01-04   | 130         |
| 5           | Winston      | 2019-01-05   | 110         | 
| 6           | Elvis        | 2019-01-06   | 140         | 
| 7           | Anna         | 2019-01-07   | 150         |
| 8           | Maria        | 2019-01-08   | 80          |
| 9           | Jaze         | 2019-01-09   | 110         | 
| 1           | Jhon         | 2019-01-10   | 130         | 
| 3           | Jade         | 2019-01-10   | 150         | 
+-------------+--------------+--------------+-------------+

Result table:

+--------------+--------------+----------------+
| visited_on   | amount       | average_amount |
+--------------+--------------+----------------+
| 2019-01-07   | 860          | 122.86         |
| 2019-01-08   | 840          | 120            |
| 2019-01-09   | 840          | 120            |
| 2019-01-10   | 1000         | 142.86         |
+--------------+--------------+----------------+

1st moving average from 2019-01-01 to 2019-01-07 has an average_amount of (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
2nd moving average from 2019-01-02 to 2019-01-08 has an average_amount of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
3rd moving average from 2019-01-03 to 2019-01-09 has an average_amount of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
4th moving average from 2019-01-04 to 2019-01-10 has an average_amount of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86

详解:

我们要计算七天内的移动平均值(当前日期 + 6天前),那么可以用 window function + rows preceding 来解决。

具体语法和解释请看 Rows and Range, Preceding and Following Download PDF

WITH CTE AS 
(  SELECT
    visited_on,
    SUM(amount) OVER(ORDER BY visited_on ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS amount,
    # moving sum
    AVG(amount) OVER(ORDER BY visited_on ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS average_amount,
    # moving average
    ROW_NUMBER() OVER(ORDER BY visited_on ASC) AS RN
  FROM (SELECT visited_on, SUM(amount) AS amount FROM Customer GROUP BY visited_on) T
  GROUP BY visited_on
)

结果如下;

{"headers": ["visited_on", "amount", "average_amount", "RN"], 
"values": 
[["2019-01-01", 100, 100.0000, 1], 
["2019-01-02", 210, 105.0000, 2], 
["2019-01-03", 330, 110.0000, 3], 
["2019-01-04", 460, 115.0000, 4], 
["2019-01-05", 570, 114.0000, 5], 
["2019-01-06", 710, 118.3333, 6], 
["2019-01-07", 860, 122.8571, 7], 
["2019-01-08", 840, 120.0000, 8], 
["2019-01-09", 840, 120.0000, 9], 
["2019-01-10", 1000, 142.8571, 10]]}

这里的原理大概是这样:

我们这里的 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW 就是当前的行和其之前的 6 行,总共 7 行,这里要注意!注意在不满足总共 7 行的情况下,会在之前的所有行的基础上运行函数。

正确答案如下:

WITH CTE AS 
( SELECT
    visited_on,
    SUM(amount) OVER(ORDER BY visited_on ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) 
    AS amount,
    AVG(amount) OVER(ORDER BY visited_on ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) 
    AS average_amount,
    ROW_NUMBER() OVER(ORDER BY visited_on ASC) 
    AS RN
  FROM (SELECT visited_on, SUM(amount) AS amount FROM Customer GROUP BY visited_on) T
  GROUP BY visited_on
)

SELECT visited_on, amount, ROUND(average_amount,2) AS average_amount
FROM CTE
WHERE RN >= 7
ORDER BY 1;

41. Running Total for Different Genders

Table: Scores

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| player_name   | varchar |
| gender        | varchar |
| day           | date    |
| score_points  | int     |
+---------------+---------+

(gender, day) is the primary key for this table.
A competition is held between females team and males team.
Each row of this table indicates that a player_name and with gender has scored score_point in someday.
Gender is ‘F’ if the player is in females team and ‘M’ if the player is in males team.

Write an SQL query to find the total score for each gender at each day.

Order the result table by gender and day

The query result format is in the following example:

Scores table:

+-------------+--------+------------+--------------+
| player_name | gender | day        | score_points |
+-------------+--------+------------+--------------+
| Aron        | F      | 2020-01-01 | 17           |
| Alice       | F      | 2020-01-07 | 23           |
| Bajrang     | M      | 2020-01-07 | 7            |
| Khali       | M      | 2019-12-25 | 11           |
| Slaman      | M      | 2019-12-30 | 13           |
| Joe         | M      | 2019-12-31 | 3            |
| Jose        | M      | 2019-12-18 | 2            |
| Priya       | F      | 2019-12-31 | 23           |
| Priyanka    | F      | 2019-12-30 | 17           |
+-------------+--------+------------+--------------+

Result table:

+--------+------------+-------+
| gender | day        | total |
+--------+------------+-------+
| F      | 2019-12-30 | 17    |
| F      | 2019-12-31 | 40    |
| F      | 2020-01-01 | 57    |
| F      | 2020-01-07 | 80    |
| M      | 2019-12-18 | 2     |
| M      | 2019-12-25 | 13    |
| M      | 2019-12-30 | 26    |
| M      | 2019-12-31 | 29    |
| M      | 2020-01-07 | 36    |
+--------+------------+-------+

For females team:
First day is 2019-12-30, Priyanka scored 17 points and the total score for the team is 17.
Second day is 2019-12-31, Priya scored 23 points and the total score for the team is 40.
Third day is 2020-01-01, Aron scored 17 points and the total score for the team is 57.
Fourth day is 2020-01-07, Alice scored 23 points and the total score for the team is 80.
For males team:
First day is 2019-12-18, Jose scored 2 points and the total score for the team is 2.
Second day is 2019-12-25, Khali scored 11 points and the total score for the team is 13.
Third day is 2019-12-30, Slaman scored 13 points and the total score for the team is 26.
Fourth day is 2019-12-31, Joe scored 3 points and the total score for the team is 29.
Fifth day is 2020-01-07, Bajrang scored 7 points and the total score for the team is 36.

正确答案如下:

select
    gender,
    day,
    sum(score_points) over(partition by gender order by day) as total
    # 求累计和! 很常用!
from scores

42. Shortest Distance in a Plane

Table point_2d holds the coordinates (x,y) of some unique points (more than two) in a plane.

Write a query to find the shortest distance between these points rounded to 2 decimals.

x y
-1 -1
0 0
-1 -2

The shortest distance is 1.00 from point (-1,-1) to (-1,2). So the output should be:

shortest
1.00

Note: The longest distance among all the points are less than 10000.

详解:

给定几个点,找到他们中的最短距离,那么我们要计算出每对不同点之间的距离,求最小值即可。

首先用 self join求出每对点的距离

select sqrt(power((a.x - b.x),2) + power((a.y-b.y),2)) as distance
from point_2d a, point_2d b

结果如下;

{"headers": ["distance"], 
"values": 
[[1.0], 
[1.4142135623730951], 
[0.0], 
[2.23606797749979], 
[0.0], 
[1.4142135623730951], 
[0.0], 
[2.23606797749979], 
[1.0]]}

当然,距离必须大于 0,并且要找到最小的。

正确答案如下:

select round(min(distance),2) as shortest
from
(
select sqrt(power((a.x - b.x),2) + power((a.y-b.y),2)) as distance
from point_2d a, point_2d b
where sqrt(power((a.x - b.x),2) + power((a.y-b.y),2)) <> 0
) t

43. Team Scores in Football Tournament

Table: Teams

+---------------+----------+
| Column Name   | Type     |
+---------------+----------+
| team_id       | int      |
| team_name     | varchar  |
+---------------+----------+

team_id is the primary key of this table.
Each row of this table represents a single football team.

Table: Matches

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| match_id      | int     |
| host_team     | int     |
| guest_team    | int     | 
| host_goals    | int     |
| guest_goals   | int     |
+---------------+---------+

match_id is the primary key of this table.
Each row is a record of a finished match between two different teams.

Teams host_team and guest_team are represented by their IDs in the teams table (team_id) and they scored host_goals and guest_goals goals respectively.

You would like to compute the scores of all teams after all matches. Points are awarded as follows:

Write an SQL query that selects the team_id, team_name and num_points of each team in the tournament after all described matches. Result table should be ordered by num_points (decreasing order). In case of a tie, order the records by team_id (increasing order).

The query result format is in the following example:

# teams table:
+-----------+--------------+
| team_id   | team_name    |
+-----------+--------------+
| 10        | Leetcode FC  |
| 20        | NewYork FC   |
| 30        | Atlanta FC   |
| 40        | Chicago FC   |
| 50        | Toronto FC   |
+-----------+--------------+

# matches table:
+------------+--------------+---------------+-------------+--------------+
| match_id   | host_team    | guest_team    | host_goals  | guest_goals  |
+------------+--------------+---------------+-------------+--------------+
| 1          | 10           | 20            | 3           | 0            |
| 2          | 30           | 10            | 2           | 2            |
| 3          | 10           | 50            | 5           | 1            |
| 4          | 20           | 30            | 1           | 0            |
| 5          | 50           | 30            | 1           | 0            |
+------------+--------------+---------------+-------------+--------------+

# result table:
+------------+--------------+---------------+
| team_id    | team_name    | num_points    |
+------------+--------------+---------------+
| 10         | Leetcode FC  | 7             |
| 20         | NewYork FC   | 3             |
| 50         | Toronto FC   | 3             |
| 30         | Atlanta FC   | 1             |
| 40         | Chicago FC   | 0             |
+------------+--------------+---------------+

详解:

我们要计算每个队伍的总得分,那么我们应该先计算主队和客队在规则下分别对应的得分。

SELECT m.host_team, m.guest_team,
(CASE WHEN m.host_goals > m.guest_goals THEN 3
      WHEN m.host_goals = m.guest_goals THEN 1 
      ELSE 0 END) AS host_point,
(CASE WHEN m.host_goals > m.guest_goals THEN 0
      WHEN m.host_goals = m.guest_goals THEN 1 
      ELSE 3 END) AS guest_point
FROM Matches AS m

结果如下:

{"headers": ["host_team", "guest_team", "host_point", "guest_point"], 
"values": 
[[10, 20, 3, 0], 
[30, 10, 1, 1], 
[10, 50, 3, 0], 
[20, 30, 3, 0], 
[50, 30, 3, 0]]}

那么接下来,我们对应主客队使用 case when 进行筛选并记录分数,最后 sum + group by 求解。

正确答案如下:

SELECT T2.team_id, T2.team_name, SUM(point) AS num_points FROM
(SELECT t.team_id, t.team_name, T1.host_team, T1.guest_team, T1.host_point, T1.guest_point,
(CASE WHEN t.team_id = T1.host_team THEN host_point
      WHEN t.team_id = T1.guest_team THEN guest_point
      ELSE 0 END) AS point 
FROM Teams t
JOIN
(SELECT m.host_team, m.guest_team,
(CASE WHEN m.host_goals > m.guest_goals THEN 3
      WHEN m.host_goals = m.guest_goals THEN 1 
      ELSE 0 END) AS host_point,
(CASE WHEN m.host_goals > m.guest_goals THEN 0
      WHEN m.host_goals = m.guest_goals THEN 1 
      ELSE 3 END) AS guest_point
FROM Matches AS m) AS T1) AS T2
GROUP BY T2.team_id ORDER BY SUM(point) DESC, team_id ASC;

44. The Most Frequently Ordered Products for Each Customer

Table: Customers

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| name          | varchar |
+---------------+---------+

customer_id is the primary key for this table.
This table contains information about the customers.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| customer_id   | int     |
| product_id    | int     |
+---------------+---------+

order_id is the primary key for this table.
This table contains information about the orders made by customer_id.
No customer will order the same product more than once in a single day.

Table: Products

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| product_name  | varchar |
| price         | int     |
+---------------+---------+

product_id is the primary key for this table.
This table contains information about the products.

Write an SQL query to find the most frequently ordered product(s) for each customer.

The result table should have the product_id and product_name for each customer_id who ordered at least one order. Return the result table in any order.

The query result format is in the following example:

# customers
+-------------+-------+
| customer_id | name  |
+-------------+-------+
| 1           | Alice |
| 2           | Bob   |
| 3           | Tom   |
| 4           | Jerry |
| 5           | John  |
+-------------+-------+

# orders
+----------+------------+-------------+------------+
| order_id | order_date | customer_id | product_id |
+----------+------------+-------------+------------+
| 1        | 2020-07-31 | 1           | 1          |
| 2        | 2020-07-30 | 2           | 2          |
| 3        | 2020-08-29 | 3           | 3          |
| 4        | 2020-07-29 | 4           | 1          |
| 5        | 2020-06-10 | 1           | 2          |
| 6        | 2020-08-01 | 2           | 1          |
| 7        | 2020-08-01 | 3           | 3          |
| 8        | 2020-08-03 | 1           | 2          |
| 9        | 2020-08-07 | 2           | 3          |
| 10       | 2020-07-15 | 1           | 2          |
+----------+------------+-------------+------------+

# products
+------------+--------------+-------+
| product_id | product_name | price |
+------------+--------------+-------+
| 1          | keyboard     | 120   |
| 2          | mouse        | 80    |
| 3          | screen       | 600   |
| 4          | hard disk    | 450   |
+------------+--------------+-------+

# result table:
+-------------+------------+--------------+
| customer_id | product_id | product_name |
+-------------+------------+--------------+
| 1           | 2          | mouse        |
| 2           | 1          | keyboard     |
| 2           | 2          | mouse        |
| 2           | 3          | screen       |
| 3           | 3          | screen       |
| 4           | 1          | keyboard     |
+-------------+------------+--------------+

Alice (customer 1) ordered the mouse three times and the keyboard one time, so the mouse is the most frquently ordered product for them.
Bob (customer 2) ordered the keyboard, the mouse, and the screen one time, so those are the most frquently ordered products for them.
Tom (customer 3) only ordered the screen (two times), so that is the most frquently ordered product for them.
Jerry (customer 4) only ordered the keyboard (one time), so that is the most frquently ordered product for them.
John (customer 5) did not order anything, so we do not include them in the result table.

详解:

本题再次考察了 window function 的使用,我们要找每个消费者最经常消费的商品,那么久对每个消费者购买商品的频数排序,取出排第一的所有商品。

正确答案如下:

with cte as
(select o.customer_id, p.product_id, p.product_name
from orders o
join products p
using (product_id)
order by 1)

select customer_id,product_id,product_name from
(select 
    customer_id, 
    product_id,
    product_name,
    count(product_name),
    dense_rank() over(partition by customer_id order by count(product_name) desc) rnk
from cte
group by 1,2
order by 1) t
where rnk = 1

45. The Most Recent Orders for Each Product

Table: Customers

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| name          | varchar |
+---------------+---------+

customer_id is the primary key for this table.
This table contains information about the customers.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| customer_id   | int     |
| product_id    | int     |
+---------------+---------+

order_id is the primary key for this table.
This table contains information about the orders made by customer_id.
There will be no product ordered by the same user more than once in one day.

Table: Products

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| product_name  | varchar |
| price         | int     |
+---------------+---------+

product_id is the primary key for this table.
This table contains information about the Products.

Write an SQL query to find the most recent order(s) of each product.

Return the result table sorted by product_name in ascending order and in case of a tie by the product_id in ascending order. If there still a tie, order them by the order_id in ascending order.

The query result format is in the following example:

# customers
+-------------+-----------+
| customer_id | name      |
+-------------+-----------+
| 1           | Winston   |
| 2           | Jonathan  |
| 3           | Annabelle |
| 4           | Marwan    |
| 5           | Khaled    |
+-------------+-----------+

# orders
+----------+------------+-------------+------------+
| order_id | order_date | customer_id | product_id |
+----------+------------+-------------+------------+
| 1        | 2020-07-31 | 1           | 1          |
| 2        | 2020-07-30 | 2           | 2          |
| 3        | 2020-08-29 | 3           | 3          |
| 4        | 2020-07-29 | 4           | 1          |
| 5        | 2020-06-10 | 1           | 2          |
| 6        | 2020-08-01 | 2           | 1          |
| 7        | 2020-08-01 | 3           | 1          |
| 8        | 2020-08-03 | 1           | 2          |
| 9        | 2020-08-07 | 2           | 3          |
| 10       | 2020-07-15 | 1           | 2          |
+----------+------------+-------------+------------+

# products
+------------+--------------+-------+
| product_id | product_name | price |
+------------+--------------+-------+
| 1          | keyboard     | 120   |
| 2          | mouse        | 80    |
| 3          | screen       | 600   |
| 4          | hard disk    | 450   |
+------------+--------------+-------+

# result table:
+--------------+------------+----------+------------+
| product_name | product_id | order_id | order_date |
+--------------+------------+----------+------------+
| keyboard     | 1          | 6        | 2020-08-01 |
| keyboard     | 1          | 7        | 2020-08-01 |
| mouse        | 2          | 8        | 2020-08-03 |
| screen       | 3          | 3        | 2020-08-29 |
+--------------+------------+----------+------------+

keyboard’s most recent order is in 2020-08-01, it was ordered two times this day.
mouse’s most recent order is in 2020-08-03, it was ordered only once this day.
screen’s most recent order is in 2020-08-29, it was ordered only once this day.
The hard disk was never ordered and we don’t include it in the result table.

详解:

我们要找每个产品最近的下单时间以及下单信息,那么同样使用 window function 来对每个产品的order_date 进行降序排序,然后筛选排序第一的信息。

正确答案如下:

select product_name, product_id, order_id, order_date from
(select
    p.product_name, p.product_id, o.order_id, o.order_date,
    dense_rank() over(partition by p.product_name order by order_date desc) rnk
from products p
join orders o
using (product_id)) t
where rnk = 1
order by 1,2,3;

46. The Most Recent Three Orders

Table: Customers

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| name          | varchar |
+---------------+---------+

customer_id is the primary key for this table.
This table contains information about customers.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| customer_id   | int     |
| cost          | int     |
+---------------+---------+

order_id is the primary key for this table.
This table contains information about the orders made by customer_id.
Each customer has one order per day.

Write an SQL query to find the most recent 3 orders of each user. If a user ordered less than 3 orders return all of their orders.

Return the result table sorted by customer_name in ascending order and in case of a tie by the customer_id in ascending order. If there still a tie, order them by the order_date in descending order.

The query result format is in the following example:

# customers
+-------------+-----------+
| customer_id | name      |
+-------------+-----------+
| 1           | Winston   |
| 2           | Jonathan  |
| 3           | Annabelle |
| 4           | Marwan    |
| 5           | Khaled    |
+-------------+-----------+

# orders
+----------+------------+-------------+------+
| order_id | order_date | customer_id | cost |
+----------+------------+-------------+------+
| 1        | 2020-07-31 | 1           | 30   |
| 2        | 2020-07-30 | 2           | 40   |
| 3        | 2020-07-31 | 3           | 70   |
| 4        | 2020-07-29 | 4           | 100  |
| 5        | 2020-06-10 | 1           | 1010 |
| 6        | 2020-08-01 | 2           | 102  |
| 7        | 2020-08-01 | 3           | 111  |
| 8        | 2020-08-03 | 1           | 99   |
| 9        | 2020-08-07 | 2           | 32   |
| 10       | 2020-07-15 | 1           | 2    |
+----------+------------+-------------+------+

# result table:
+---------------+-------------+----------+------------+
| customer_name | customer_id | order_id | order_date |
+---------------+-------------+----------+------------+
| Annabelle     | 3           | 7        | 2020-08-01 |
| Annabelle     | 3           | 3        | 2020-07-31 |
| Jonathan      | 2           | 9        | 2020-08-07 |
| Jonathan      | 2           | 6        | 2020-08-01 |
| Jonathan      | 2           | 2        | 2020-07-30 |
| Marwan        | 4           | 4        | 2020-07-29 |
| Winston       | 1           | 8        | 2020-08-03 |
| Winston       | 1           | 1        | 2020-07-31 |
| Winston       | 1           | 10       | 2020-07-15 |
+---------------+-------------+----------+------------+

Winston has 4 orders, we discard the order of “2020-06-10” because it is the oldest order.
Annabelle has only 2 orders, we return them.
Jonathan has exactly 3 orders.
Marwan ordered only one time.
We sort the result table by customer_name in ascending order, by customer_id in ascending order and by order_date in descending order in case of a tie.

详解:

我们要找每个用户最近的三个订单,如果用户的订单少于三个那么全部显示出来。我们同样使用 window function 求解。

select
    name as customer_name,
    o.customer_id,
    o.order_id,
    o.order_date,
    dense_rank() over(partition by customer_id order by order_date desc) rnk
from orders o
join customers c
using (customer_id) 
order by 1,2,4 desc

结果如下:

{"headers": ["customer_name", "customer_id", "order_id", "order_date", "rnk"], 
"values": 
[["Annabelle", 3, 7, "2020-08-01", 1], 
["Annabelle", 3, 3, "2020-07-31", 2],
["Jonathan", 2, 9, "2020-08-07", 1], 
["Jonathan", 2, 6, "2020-08-01", 2], 
["Jonathan", 2, 2, "2020-07-30", 3], 
["Marwan", 4, 4, "2020-07-29", 1], 
["Winston", 1, 8, "2020-08-03", 1], 
["Winston", 1, 1, "2020-07-31", 2], 
["Winston", 1, 10, "2020-07-15", 3], 
["Winston", 1, 5, "2020-06-10", 4]]}

那么我们只需要筛选 rnk <= 3 的情况即可。

正确答案如下:

select customer_name,customer_id,order_id,order_date from
(select
    name as customer_name,
    o.customer_id,
    o.order_id,
    o.order_date,
    dense_rank() over(partition by customer_id order by order_date desc) rnk
from orders o
join customers c
using (customer_id) 
order by 1,2,4 desc) t
where rnk <= 3

47. Tree Node

Given a table tree, id is identifier of the tree node and p_id is its parent node’s id.

+----+------+
| id | p_id |
+----+------+
| 1  | null |
| 2  | 1    |
| 3  | 1    |
| 4  | 2    |
| 5  | 2    |
+----+------+

Each node in the tree can be one of three types:

Write a query to print the node id and the type of the node. Sort your output by the node id. The result for the above sample is:

+----+------+
| id | Type |
+----+------+
| 1  | Root |
| 2  | Inner|
| 3  | Leaf |
| 4  | Leaf |
| 5  | Leaf |
+----+------+

Explanation

详解:

根据上面的图,我们可以知道,根据树的结构,我们来判断每个点的位置。

那么我们把这三种情况都筛选出来,然后 union 起来就可以了。

  1. root
SELECT id, 'Root' AS Type
FROM tree
WHERE p_id IS NULL
  1. leaf
SELECT id, 'Leaf' AS Type
FROM tree
WHERE
id NOT IN 
(SELECT DISTINCT p_id
FROM tree
WHERE p_id IS NOT NULL)
# 保证不是其他点的parent
AND p_id IS NOT NULL
# 保证不是根节点
  1. inner
SELECT id, 'Inner' AS Type
FROM tree
WHERE id IN 
(SELECT DISTINCT p_id
FROM tree
WHERE p_id IS NOT NULL)
# 保证是别的点的parent
AND p_id IS NOT NULL
# 保证不是根节点

最后把他们 union 起来。

正确答案如下:

SELECT
    id, 'Root' AS Type
FROM
    tree
WHERE
    p_id IS NULL

UNION

SELECT
    id, 'Leaf' AS Type
FROM
    tree
WHERE
    id NOT IN (SELECT DISTINCT
            p_id
        FROM
            tree
        WHERE
            p_id IS NOT NULL)
        AND p_id IS NOT NULL
        
UNION

SELECT
    id, 'Inner' AS Type
FROM
    tree
WHERE
    id IN (SELECT DISTINCT
            p_id
        FROM
            tree
        WHERE
            p_id IS NOT NULL)
        AND p_id IS NOT NULL
ORDER BY id;

48. Unpopular Books

Table: Books

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| book_id        | int     |
| name           | varchar |
| available_from | date    |
+----------------+---------+

book_id is the primary key of this table.

Table: Orders

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| order_id       | int     |
| book_id        | int     |
| quantity       | int     |
| dispatch_date  | date    |
+----------------+---------+

order_id is the primary key of this table.
book_id is a foreign key to the Books table.

Write an SQL query that reports the books that have sold less than 10 copies in the last year, excluding books that have been available for less than 1 month from today. Assume today is 2019-06-23.

The query result format is in the following example:

Books table:

+---------+--------------------+----------------+
| book_id | name               | available_from |
+---------+--------------------+----------------+
| 1       | "Kalila And Demna" | 2010-01-01     |
| 2       | "28 Letters"       | 2012-05-12     |
| 3       | "The Hobbit"       | 2019-06-10     |
| 4       | "13 Reasons Why"   | 2019-06-01     |
| 5       | "The Hunger Games" | 2008-09-21     |
+---------+--------------------+----------------+

Orders table:

+----------+---------+----------+---------------+
| order_id | book_id | quantity | dispatch_date |
+----------+---------+----------+---------------+
| 1        | 1       | 2        | 2018-07-26    |
| 2        | 1       | 1        | 2018-11-05    |
| 3        | 3       | 8        | 2019-06-11    |
| 4        | 4       | 6        | 2019-06-05    |
| 5        | 4       | 5        | 2019-06-20    |
| 6        | 5       | 9        | 2009-02-02    |
| 7        | 5       | 8        | 2010-04-13    |
+----------+---------+----------+---------------+

Result table:

+-----------+--------------------+
| book_id   | name               |
+-----------+--------------------+
| 1         | "Kalila And Demna" |
| 2         | "28 Letters"       |
| 5         | "The Hunger Games" |
+-----------+--------------------+

详解:

我们要找到去年卖的少于 10 本的书,排除从今天(2019-06-23)开始 available 少于一个月的书。

那么先找每本书的销量

select b.book_id, sum(quantity) as q, dispatch_date
from orders b
where dispatch_date between '2018-06-23' and '2019-06-23'
group by book_id

结果如下:

{"headers": ["book_id", "q", "dispatch_date"], 
"values": 
[[1, 3, "2018-07-26"], 
[3, 8, "2019-06-11"], 
[4, 11, "2019-06-05"]]}

最后再筛选时间,并把结果 join 到一起。

正确答案如下:

select a.book_id, a.name
from books a
left join 
(select b.book_id, sum(quantity) as q, dispatch_date
from orders b
where dispatch_date between '2018-06-23' and '2019-06-23'
group by book_id) tmp
using(book_id)
where available_from < '2019-05-23' and (q<10 or q is null)


附件

https://gist.github.com/zg104/7af3889ec1d72e11e5f26dd75414874a

小结

那么 SQL 中等难度部分的刷题就告一段落啦,先恭喜大家完成 SQL 简单类型题目的训练,相信大家对 SQL 的函数、语句以及解题关键点都有了更深层次的理解,并能把问题进行拆解,对一定的类型题足够敏感,对 window function 也非常的熟悉了,如果你做到了,那么恭喜你!

最后我要对你说: