加粗的题目里面含有follow up question!
斜体的题目是系列套题,可以一起做!
代码整理全部在附件里(文章最后)
这篇文章主要帮助大家开启对 SQL 的了解与学习,总结了 LeetCode 中的 Database 中 SQL(简单)里面代表性较高且值得去反复做的 47 道题,希望能对大家入门 SQL 有所帮助!
Now, Let’s get started!
注意到,这里的table: ActorDirector
包含三列,其中timestamp是主键。
我们需要找到所有的配对(actor_id, director_id)其中演员和导演合作过至少三次。
所以,我们要求actor_id
和director_id
配对的数量不小于三,
利用count
函数我们有:
select actor_id, director_id, count(*) as counts
from actordirector
group by actor_id, director_id
这一步的结果如下,我们对应每个演员、导演并且计算出他们合作的次数,分别为3,2,2。
{"headers": ["actor_id", "director_id", "c"], "values": [[1, 1, 3], [1, 2, 2], [2, 1, 2]]}
如果不清楚group by
可以看这里。
还有,我们可以用group by 1,2
来代替group by actor_id, director_id
,其实1和2代表的就是select
前两列的意思。
select actor_id, director_id, count(*) as counts
from actordirector
group by 1,2
接下来,我们只需在这个基础上找到所有的演员和导演满足counts >= 3
即可。
最终答案如下:
select actor_id, director_id from
(select actor_id, director_id, count(*) as counts
from actordirector
group by 1,2) tmp
where counts >= 3
这里使用了十分常见的subquery,其中注意要在table后面加上别名(alias),这里用tmp表示
注意,Ads
表里面ad_id
和user_id
是主键,我们需要按照给出的公式算出CTR。
所以根据每个ad_id
,我们可以计算出其对应的Clicked
和Viewed
的个数,在利用公式可求得CTR。
select ad_id,
# case when类似于if else语句,表示当action = 'Clicked'记为1,否则记为0,注意一定后面加上end!用sum函数计算每个不同的ad_id对应的Clicked的总数,同理用其除以后面的式子,得到ctr,并用round函数保留两位小数
round(100*sum(case when action = 'Clicked' then 1 else 0 end) / sum(case when action = 'Clicked' or action = 'Viewed' then 1 else 0 end),2) as ctr
from ads
group by ad_id # 对应每个不同的ad_id
order by ctr desc, ad_id asc; # 排序
结果如下:
{"headers": ["ad_id", "ctr"], "values": [[1, 66.67], [3, 50.00], [2, 33.33], [5, null]]}
我们发现,当ad_id
等于5
,ctr为null
而不是0.00
。所以我们需要对这种情况做些调整,用ifnull
函数把null
转换成0
。
最终答案如下:
# case when类似于if else语句,表示当action = 'Clicked'记为1,否则记为0,注意一定后面加上end!
# 用sum函数计算每个不同的ad_id对应的Clicked的总数,同理用其除以后面的式子,得到ctr,并用round函数保留两位小数,且用ifnull函数处理缺失值。
select ad_id,
ifnull(round(100*sum(case when action = 'Clicked' then 1 else 0 end) / sum(case when action = 'Clicked' or action = 'Viewed' then 1 else 0 end),2),0) as ctr
from ads
group by ad_id # 对应每个不同的ad_id
order by ctr desc, ad_id asc; # 排序
Table: Views
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |
+---------------+---------+
注意这里没有主键,说明可能会含有重复的行,每一行表示某个观看者在某个日期观看了某个作者写的某篇文章。
我们要找出所有的观看了至少一篇自己的文章的作者,并且按照他们的id的升序排序。
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+---+----+----+---------------+
Result table:
+------+
| id |
+------+
| 4 |
| 7 |
+------+
正确答案如下:
select distinct author_id as 'id'
from views
where author_id = viewer_id
order by 1
# 注意
# 1. distinct 函数筛选出了唯一的auther_id,因为可能有重复的行存在
# 2. order by 1 = order by author_id
Table: Prices
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| product_id | int |
| start_date | date |
| end_date | date |
| price | int |
+---------------+---------+
这里的 product_id
、start_date
和 end_date
都是主键,
每一行表示了在开始日期到结束日期内的每个 product_id
的价格,
并且对于一个 product_id
,不会有两个相交的时间区间存在。
Table: UnitsSold
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| product_id | int |
| purchase_date | date |
| units | int |
+---------------+---------+
这里没有主键,所以有重复的行,每一行表示对于每个卖出的产品的 product_id
、purchase_date
和 units
。
我们需要找到每个产品的平均售卖价格。
例如:
Prices table:
+------------+------------+------------+--------+
| product_id | start_date | end_date | price |
+------------+------------+------------+--------+
| 1 | 2019-02-17 | 2019-02-28 | 5 |
| 1 | 2019-03-01 | 2019-03-22 | 20 |
| 2 | 2019-02-01 | 2019-02-20 | 15 |
| 2 | 2019-02-21 | 2019-03-31 | 30 |
+------------+------------+------------+--------+
UnitsSold table:
+------------+---------------+-------+
| product_id | purchase_date | units |
+------------+---------------+-------+
| 1 | 2019-02-25 | 100 |
| 1 | 2019-03-01 | 15 |
| 2 | 2019-02-10 | 200 |
| 2 | 2019-03-22 | 30 |
+------------+---------------+-------+
Result table:
+------------+---------------+
| product_id | average_price |
+------------+---------------+
| 1 | 6.96 |
| 2 | 16.96 |
+------------+---------------+
Average selling price = Total Price of Product / Number of products sold.
Average selling price for product `1 = ((100 * 5) + (15 * 20)) / 115 = 6.96`
Average selling price for product `2 = ((200 * 15) + (30 * 30)) / 230 = 16.96`
正确答案如下:
select p.product_id, round(sum(price*units)/sum(units),2) as average_price
from prices as p
join unitssold as u
using (product_id)
where u.purchase_date between p.start_date and p.end_date
group by p.product_id
# 注意
# 1. 对于table要加上alias,并且对于公共的product_id要加上对应的table名
# 2. using (product_id) 等价于 on p.product_id = u.product_id
# 3. 要保证purchase_date在start_date和end_date中间,可能会被忽略,并且这个条件要在group by之前,因为在aggregate之前我们要筛选出满足时间范围的行,在进行计算!
Table my_numbers
contains many numbers in column num including duplicated ones.
Can you write a SQL query to find the biggest number, which only appears once.
+---+
|num|
+---+
| 8 |
| 8 |
| 3 |
| 3 |
| 1 |
| 4 |
| 5 |
| 6 |
For the sample data above, your query should return the following result:
+---+
|num|
+---+
| 6 |
Note:
If there is no such number, just output null.
那么我们要找的就是num里面只出现一次的最大的数
可能你会这么写
select max(num) as 'num'
from my_number
group by num
having count(*) = 1
但这样会返回错误的值,如下:
{"headers": ["num"], "values": [[1], [4], [5], [6]]}
我们是找出了出现次数为1的数字但是max函数貌似不起作用,因为这里的max(num)
针对的是num,相当于对每一个group by的数字都取了最大值,也就是本身,所以无效
正确答案如下:
select max(num) as num
from
(select
num
from
my_numbers
group by num
having count(num) = 1) as tmp
# 使用subquery对所有出现次数为1的数字取最大值
或者
select max(num) as num
from
(select
num,count(*) as counts
from
my_numbers
group by num) as tmp
where counts = 1
Table: Person
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| PersonId | int |
| FirstName | varchar |
| LastName | varchar |
+-------------+---------+
PersonId is the primary key column for this table.
Table: Address
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| AddressId | int |
| PersonId | int |
| City | varchar |
| State | varchar |
+-------------+---------+
AddressId is the primary key column for this table.
Write a SQL query for a report that provides the following information for each person in the Person table, regardless if there is an address for each of those people:
FirstName, LastName, City, State
这个题看似很简单,只需要把他们合并起来就可以了,但是要注意inner join
和left join
的区别(right join
其实和left join
是一回事)。
一图流!
这里题目要求我们找出每个人的各种信息,无论这个人是不是有地址,也就是哪怕没有地址,也要用null
表示,那自然要用left join
。
正确答案如下:
select FirstName, LastName, City, State
from Person left join Address
on Person.PersonId = Address.PersonId
# using (PersonId)
结果如下:
{"headers": ["FirstName", "LastName", "City", "State"], "values": [["Allen", "Wang", null, null]]}
Several friends at a cinema ticket office would like to reserve consecutive available seats.
Can you help to query all the consecutive available seats order by the seat_id using the following cinema
table?
seat_id | free |
---|---|
1 | 1 |
2 | 0 |
3 | 1 |
4 | 1 |
5 | 1 |
Your query should return the following result for the sample case above.
seat_id |
---|
3 |
4 |
5 |
Note:
我们要找到连续挨着的并且是可以坐的(available)的seat_id
,并按照其升序排序。
感觉不是那么简单对吧,这里要用到cross join
。
一图流!
所谓cross join
也就是两个表格的笛卡尔积,是我们很常用的手段之一。那么如果这个table有5行,那么在经过跟自己本身cross join
之后,就会变成一个5 x 5
的table,我们要找的就是 table A 和 table B(当然,他们其实是同一个table,只不过名称不同)的满足下列两个条件的seat_id
:
free = 1
,因为我们要满足 available。seat_id
是连续的,也就是他们差的绝对值是等于1的正确答案如下:
select distinct a.seat_id
from cinema a, cinema b
where abs(a.seat_id - b.seat_id) = 1
and a.free = 1
and b.free = 1
order by 1
# 直接使用 select from table A, table B 这种语法其实就是cross join,不要以为你没有用join,其实这就是和cross join等价的!
或者
select distinct a.seat_id
from cinema a
join cinema b
on abs(a.seat_id - b.seat_id) = 1
and a.free = 1
and b.free = 1
order by 1
# 这个就是self join的一般写法!
我估计你可能还是没太明白self join
的含义,这个题是属于cross join
的题,但也属于self join
,因为table是和自己本身cross join
的,但是cross join
不局限于自己和自己本身去 join,下面这个链接会帮你更好的理解self join
!
补充资料:
这个题目算是比较复杂的了,我会一步一步拆开讲解!
首先这里有三个 table,分别是 customers
、product
和 orders
,我们需要从中找出 customer_id
和 name
满足下面两个条件:
首先,我们从例子中可以看到,product_id
和 customer_id
可以作为连接表的媒介,并且是不存在其中一个媒介的元素一个表格里有,而另一个表格没有,这就说明我们可以放心使用 inner join
而不用担心 null
会存在的问题!
那么自然我们要把 product
和 orders
连接在一起,因为他们包含了产品的价格和数量,我们需要计算他们的乘积,得出某个人在某个时间在某个产品上花费的总钱数。
select 先空着
from product p
join orders o
on p.product_id = o.product_id
# using (product_id)
那么这样我们就把两个 table 连接起来了,我们要选取这个新表格的哪些列呢?
根据我们所需要满足的条件,肯定需要 customer_id
,因为我们需要知道是谁买的;也需要 order_date
,因为我们只要找在6月和7月有消费的人;最后还有 price * quantity
,因为我们要找到消费总额超过$100的人。
所以我们有
select customer_id, order_date price*quantity
from product p
join orders o
on p.product_id = o.product_id
然后,我们发现 order_date
是以年月日的形式出现的,但我们需要把他转换成以月为单位,因为我们只要找6或7月的消费的人,不关心具体是哪一天,这里使用了 date_format
,可以对日期进行转换。并且,我们需要针对每个消费者,计算出他们在6月和7月的总消费,所以用 sum
函数和 group by
函数;在这个基础上,我们要限定 order_date
在6到7月内,使用 between and
,最后,要保证每人月均总消费不小于$100。
select customer_id,
date_format(order_date, '%Y-%m') as 'order_date',
sum(price*quantity) as total
from product p
join orders o
on p.product_id = o.product_id
where order_date between '2020-06-01' and '2020-07-31'
group by 2,1 # 也就是group by order_date和customer_id
having total >= 100
结果如下:
{"headers": ["customer_id", "order_date", "total"], "values": [[2, "2020-06", 600], [1, "2020-06", 300], [3, "2020-06", 110], [1, "2020-07", 100]]}
那么到这里,这道题的雏形已经出来了!我们已经成功找到了6和7月的月均总消费不小于$100的所有人,但是我们要找的人一定要在6和7月都有消费,也就是对于每个消费者,他们在其中出现的次数要等于2!
所以上面的结果成为了我们即将要去搜索的新表格,接下来使用subquery,对其进行进一步搜索。
正确答案如下:
select customer_id, name
from customers
where customer_id in # 从customer表里面找到满足下面括号内容的customer_id
(
select customer_id
from
(select customer_id,
date_format(order_date, '%Y-%m') as 'order_date',
sum(price*quantity) as total
from product p
join orders o
on p.product_id = o.product_id
where order_date between '2020-06-01' and '2020-07-31'
group by 2,1
having total >= 100) tmp # tmp就是我们之前建立的新表格
group by customer_id
having count(customer_id) = 2
# 从新表格里面选出在其中出现次数等于2的消费者
)
看着是不是很复杂,其实并没有,真正工作当中,应用当中比这个复杂的要多的多。所以也不要感觉SQL特别简单,一学就会,其实也不尽然(这道题被归在简单的题集里)。还有我要建议你要把代码写的稍微“好看”一点,方便别人读懂,如果像下面这么写,估计别人很难看懂你要干什么吧,哪怕这仅仅是20行不到的代码。
select customer_id from (select customer_id, date_format(order_date, '%Y-%m') as 'order_date', sum(price*quantity) as total
from product p join orders o on p.product_id = o.product_id
where order_date between '2020-06-01' and '2020-07-31' group by 2,1
having total >= 100) tmp group by customer_id having count(customer_id) = 2)
看了这个,别人可能会想说。。。
Query the customer_number from the orders table for the customer who has placed the largest number of orders.
It is guaranteed that exactly one customer will have placed more orders than any other customer.
The orders table is defined as follows:
Column | Type |
---|---|
order_number (PK) | int |
customer_number | int |
order_date | date |
required_date | date |
shipped_date | date |
status | char(15) |
comment | char(200) |
Sample Input
order_number | customer_number | order_date | required_date | shipped_date | status | comment |
---|---|---|---|---|---|---|
1 | 1 | 2017-04-09 | 2017-04-13 | 2017-04-12 | Closed | |
2 | 2 | 2017-04-15 | 2017-04-20 | 2017-04-18 | Closed | |
3 | 3 | 2017-04-16 | 2017-04-25 | 2017-04-20 | Closed | |
4 | 3 | 2017-04-18 | 2017-04-28 | 2017-04-25 | Closed |
Sample Output
customer_number |
---|
3 |
Explanation
The customer with number ‘3’ has two orders, which is greater than either customer ‘1’ or ‘2’ because each of them only has one order.
So the result is customer_number ‘3’.
要注意,我们保证了只有一个消费者会比其他消费者下更多的单,也就是说我们不必担心会有多个消费者下了同样多的单,并且是最多的。
所以我们可以使用limit
函数和order by
函数对每个客户下单次数进行降序排序,并且取第一个,也就是最大的那个。
正确答案如下:
select customer_number
from orders
group by 1
order by count(order_number) desc
# 针对每个消费者查找其下单次数,并降序排列
limit 1 # 取第一个
Follow up: What if more than one customer have the largest number of orders, can you find all the customer_number in this case?
如果面试官抛给了你这样的问题,你还会吗?
首先,降序排列找第一个这种策略已经不适用了,因为有可能出现好几个消费者都下了一样多的单,并且都是最多的,我们也不知道取前几个,这时会用到我们很常用的window function来辅助我们进行排序。
先来说一下window function的种类,我想下面这个图可以让你理解这三种排序函数的区别。
row_number
返回从1开始的序列rank
返回当前结果中每一个记录的排名dense_rank
和rank
很像,但是没有gap
那么我们先对每个消费者的下单数进行降序排序
select customer_number,
rank() over (order by count(*) desc) as rnk
# 这里我们使用rank()函数,其实在查找排名第一的情况时,
# dense_rank()和rank()其实没区别的
from orders
先重新定义一下我们的输入:
{"headers":{"orders":["order_number","customer_number"]},"rows":{"orders":[[1,1],[2,2],[3,3],[4,3],[5,4],[6,4]]}}
可以看出,编号为3和4的消费者都下单了两次,并且是最多的。
根据上面的代码,我们得到结果如下:
{"headers": ["customer_number", "rnk"], "values": [[3, 1], [4, 1], [1, 3], [2, 3]]}
意思是customer_id
为3和4的消费者下单量排名第一,那么其他的我们就不考虑了,因为我们只考虑排名第一的。
那么在这个基础上,我们要再从这个新表格里面筛选customer_id
满足rnk = 1
即可,使用subquery结果如下:
select customer_number
from
(
select customer_number,
rank() over (order by count(*) desc) as rnk
from orders
group by customer_number
) tmp
where rnk = 1 # 这里要记得用alias哦!
那么,我现在有一个很欠揍的问题,如果我不想找排名第一的了,我想找排名第二的,而且我也不保证会出现排名第二的消费者下单数都相同的情况哦!
那么,拯救世界的任务就拜托你了!
咳咳,那么来点简单的划划水吧!
这个题非常的straightforward,就是让我们找所有一毛不拔的顾客,什么都不买的人。
有人说:“见到两个table我就手痒,就想join怎么办?”
join归join,您先想清楚怎么join好伐,之前说过了 inner join
和 left join
的区别。那么既然我们要找没买东西的人,那就是要找两个table连到一起后,orders表格中的Id
为null
的人嘛!
正确答案如下:
select Name as 'Customers'
from customers as 'c'
left join orders as 'o'
on c.Id = o.CustomerId
where o.Id is null
但是毕竟我们用了join,在跑大型数据集时时间还是会有些长,那么我又变成了坏人,能不能不用join呢?
那么便是筛选Customers
表格中的不在Orders
表格中的CustomerId
中的Id
呗。
Emmmmm我想你应该明白我的意思…
select name as 'Customers'
from customers
where id not in # not in 作为筛选条件也真的很常见了!
(
select customerid from orders
);
可以看到在不用join的情况下,速度还是快很多的!
我们要删除所有重复的邮件,并按照只保留Id
的结果(如果重复的话)
首先,我们使用self join来满足下列两个条件:
Email
相同Id
大的select p1.*
FROM Person p1,
Person p2
WHERE
p1.Email = p2.Email
# Email相同
AND
p1.Id > p2.Id
# 这样保证我们找到了所有重复的email并且找出所有大于最小Id的结果
;
上面的结果就是我们要删除的
最终答案如下:
delete p1.*
FROM Person p1,
Person p2
WHERE
p1.Email = p2.Email
AND
p1.Id > p2.Id
;
这个题是不是很简单,就是要删除重复的邮件,直接上答案了,不会做的要加油啦!
正确答案如下:
select Email
from Person
group by Email
# 这个Email是Person表的Email这一列
having count(Email) > 1;
这个题比较简单,依旧考察大家对 join 和缺失值的运用,首先使用left join
连接两表,在使用where
设定题中所给的条件即可。
正确答案如下:
select name, bonus
from Employee
left join Bonus
using (empId)
where bonus < 1000 or bonus is null
# 要注意缺失值也要体现出来,如果没有写
# bonus is null的话,是不会显示null的行的
The Employee
table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.
+----+-------+--------+-----------+
| Id | Name | Salary | ManagerId |
+----+-------+--------+-----------+
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | NULL |
| 4 | Max | 90000 | NULL |
+----+-------+--------+-----------+
Given the Employee
table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.
+----------+
| Employee |
+----------+
| Joe |
+----------+
这个题算是比较经典的self join
面试题了,目的是要找所有比自己经理工资更多的雇员。
我们看到了不是每个雇员都对应着经理,要注意这一点!
接下来我们用self join
把表格自连接
# cross join相当于求两个table的笛卡尔积
select
a.Name as Employee
from
Employee a, Employee b
前四列来自表格a,后四列来自表格b。
那么接下来就要筛选满足下面两个条件的行:
a.ManagerId = b.Id
a.Salary > b.Salary
你可能有疑问了,为啥不是a.Id = b.ManagerId
呢?那不差不多么,其实差的十万八千里了,self join 是有顺序的就跟left join
和right join
一样,本质是一回事,但是哪么表格在前哪个表格在后,决定了连接方式。
首先,我们要取的Name
这一列是来自于表格 a 的,那么这种情况下,表格 a self join
表格 b 说白了就是先取出表格 a 的第一行,然后跟表格 b 的每一行做比对,对这个Employee
表格来说就有了四行,再取出表格 a 的第二行,再跟表格 b 的每一行做比对,又有四行,依次类推,总共就是 4 x 4 = 16
行。
所以,以这种比对方式,我们自然要筛选的是a.ManagerId = b.Id
,ManagerId
的缺失值可以因此被排除掉,我们只筛选a.ManagerId = b.Id
且a.ManagerId
存在的情况,反之,则没排除a.ManagerId is null
的情况。
其实,如果我们要取的Name
这一列是来自于表格 b 的,那么一切就都反过来了,条件自然是a.Id = b.ManagerId
。
正确答案如下:
select
a.Name as Employee
from
Employee a, Employee b
where a.ManagerId = b.Id
and a.Salary > b.Salary
或者
select
a.Name as Employee
from
Employee as a
JOIN
Employee as b
ON a.ManagerId = b.Id
where a.Salary > b.Salary
Given a table customer
holding customers information and the referee.
+------+------+-----------+
| id | name | referee_id|
+------+------+-----------+
| 1 | Will | NULL |
| 2 | Jane | NULL |
| 3 | Alex | 2 |
| 4 | Bill | NULL |
| 5 | Zack | 1 |
| 6 | Mark | 2 |
+------+------+-----------+
Write a query to return the list of customers NOT referred by the person with id ‘2’.
For the sample data above, the result is:
+------+
| name |
+------+
| Will |
| Jane |
| Bill |
| Zack |
+------+
我们要找到所有没有被id = 2
refer的消费者。
那么所筛选的消费者要满足下列两个条件其中之一:
referee_id != 2
referee_id is null
正确答案如下:
select name
from customer
where
referee_id != 2
or
referee_id is NULL
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
| mail | varchar |
+---------------+---------+
user_id is the primary key for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.
Write an SQL query to find the users who have valid emails.
A valid e-mail has a prefix name and a domain where:
'_'
, period '.'
and/or dash '-'
. The prefix name must start with a letter.'@leetcode.com'
.Return the result table in any order.
The query result format is in the following example.
# Users
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | winston@leetcode.com |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | bella-@leetcode.com |
| 4 | Sally | sally.come@leetcode.com |
| 5 | Marwan | quarz#2020@leetcode.com |
| 6 | David | david69@gmail.com |
| 7 | Shapiro | .shapo@leetcode.com |
+---------+-----------+-------------------------+
# Result table
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | winston@leetcode.com |
| 3 | Annabelle | bella-@leetcode.com |
| 4 | Sally | sally.come@leetcode.com |
+---------+-----------+-------------------------+
# The mail of user 2 doesn't have a domain.
# The mail of user 5 has # sign which is not allowed.
# The mail of user 6 doesn't have leetcode domain.
# The mail of user 7 starts with a period.
这里我们使用正则表达式来寻找特殊字符和结构来进行字符串匹配
正确答案如下:
SELECT *
FROM Users
WHERE
mail REGEXP '^[A-Za-z][A-Za-z0-9_.-]*@leetcode.com$'
^[A-Za-z]
代表以 A 到 Z 或者 a 到 z 的字母开头的字符串*
表示其前面必须有内容$
表示以某字符串结尾所以最后 '^[A-Za-z][A-Za-z0-9_.-]*@leetcode.com$'
表示
_
、.
、-
。关于正则表达式在sql的使用,请看
Table: Employee
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| employee_id | int |
| team_id | int |
+---------------+---------+
employee_id is the primary key for this table.
Each row of this table contains the ID of each employee and their respective team.
Write an SQL query to find the team size of each of the employees.
Return result table in any order.
The query result format is in the following example:
# Employee Table:
+-------------+------------+
| employee_id | team_id |
+-------------+------------+
| 1 | 8 |
| 2 | 8 |
| 3 | 8 |
| 4 | 7 |
| 5 | 9 |
| 6 | 9 |
+-------------+------------+
# Result table:
+-------------+------------+
| employee_id | team_size |
+-------------+------------+
| 1 | 3 |
| 2 | 3 |
| 3 | 3 |
| 4 | 1 |
| 5 | 2 |
| 6 | 2 |
+-------------+------------+
# Employees with Id 1,2,3 are part of a team with team_id = 8.
# Employees with Id 4 is part of a team with team_id = 7.
# Employees with Id 5,6 are part of a team with team_id = 9.
我们要找到每个人对应的队伍里含有的总人数是多少,那么首先要根据每个team_id
计算出各自队伍的人数是多少
select team_id, count(*) as c
from Employee
group by team_id
结果如下:
{"headers": ["team_id", "c"], "values": [[8, 3], [7, 1], [9, 2]]}
意味着我们得到了每个team_id
对应的人数,那么我们只需要把这个结果和Employee
表格再连接一下即可,就可以把employee_id
和c
(c
作为count(*)
的别名)联系起来。
正确答案如下:
select
employee_id, c as team_size
from
Employee
join
(
select
team_id, count(*) as c
from
Employee
group by
team_id
) tmp
using (team_id)
Table: Sales
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| sale_id | int |
| product_name | varchar |
| sale_date | date |
+--------------+---------+
sale_id
is the primary key for this table.
Each row of this table contains the product name and the date it was sold.
Since table Sales was filled manually in the year 2000, product_name
may contain leading and/or trailing white spaces, also they are case-insensitive.
Write an SQL query to report
product_name
in lowercase without leading or trailing white spaces.sale_date
in the format ('YYYY-MM')
total
the number of times the product was sold in this month.Return the result table ordered by product_name
in ascending order, in case of a tie order it by sale_date
in ascending order.
The query result format is in the following example.
# Sales
+------------+------------------+--------------+
| sale_id | product_name | sale_date |
+------------+------------------+--------------+
| 1 | LCPHONE | 2000-01-16 |
| 2 | LCPhone | 2000-01-17 |
| 3 | LcPhOnE | 2000-02-18 |
| 4 | LCKeyCHAiN | 2000-02-19 |
| 5 | LCKeyChain | 2000-02-28 |
| 6 | Matryoshka | 2000-03-31 |
+------------+------------------+--------------+
# Result table:
+--------------+--------------+----------+
| product_name | sale_date | total |
+--------------+--------------+----------+
| lcphone | 2000-01 | 2 |
| lckeychain | 2000-02 | 2 |
| lcphone | 2000-02 | 1 |
| matryoshka | 2000-03 | 1 |
+--------------+--------------+----------+
In January, 2 LcPhones were sold, please note that the product names are not case sensitive and may contain spaces.
In Februery, 2 LCKeychains and 1 LCPhone were sold.
In March, 1 matryoshka was sold.
这个题看似很复杂,其实我们只需要对基本的列进行转化再使用简单的group by
就能得到答案。
我们筛选的结果要满足:
product_name
是小写的,并且前面后面都没有空格sale_date
以('YYYY-MM')
出现total
表示每个产品在这个月的销量这里我们用了lower
函数使product_name
小写,并用trim
函数去掉字符串前面后面的空格;并且使用date_format
函数对日期进行转化,之前讲过了不再赘述。
正确答案如下:
select
lower(trim(product_name)) as 'product_name',
date_format(sale_date, '%Y-%m') as 'sale_date',
count(*) as 'total'
from
sales
group by 1,2
order by 1,2;
# 这里的1和2指的是select里的第一列第二列
# 不能用alias代替
In social network like Facebook or Twitter, people send friend requests and accept others’ requests as well. Now given two tables as below:
Table: friend_request
sender_id | send_to_id | request_date |
---|---|---|
1 | 2 | 2016-06-01 |
1 | 3 | 2016-06-01 |
1 | 4 | 2016-06-01 |
2 | 3 | 2016-06-02 |
3 | 4 | 2016-06-09 |
Table: request_accepted
requester_id | accepter_id | accept_date |
---|---|---|
1 | 2 | 2016-06-03 |
1 | 3 | 2016-06-08 |
2 | 3 | 2016-06-08 |
3 | 4 | 2016-06-09 |
3 | 4 | 2016-06-10 |
Write a query to find the overall acceptance rate of requests rounded to 2 decimals, which is the number of acceptance divide the number of requests.
For the sample data above, your query should return the following result.
accept_rate |
---|
0.80 |
Note:
friend_request
. In this case, you just need to simply count the total accepted requests (no matter whether they are in the original requests), and divide it by the number of requests to get the acceptance rate.Explanation: There are 4 unique accepted requests, and there are 5 requests in total. So the rate is 0.80.
Follow-up:
这个题很有意思,我会细讲这一题,这题考察我们以下几个点:
题中这种情况,我们发现有4条唯一被接受的申请,而总共5条申请,所以接受率为0.80
。
所以,我们要用count
和distinct
函数分别数两个表中唯一存在的(sender_id,send_to_id)
和(requester_id,accepter_id)
的出现次数,并用后者除以前者即可,并且使用ifnull
函数处理缺失值,round
函数保留小数点后两位。
正确答案如下:
select
ifnull(round(b.c / a.c,2),0) as accept_rate
from
(select
count(distinct sender_id, send_to_id) as c
from
friend_request) a,
(select
count(distinct requester_id, accepter_id) as c
from
request_accepted) b;
# a和b作为两个table存在,分别数两个表中唯一存在的(sender_id,send_to_id)和(requester_id,accepter_id)的出现次数
这种写法能清晰的体现出表格 a 和表格 b 都分别代表什么,比较推荐!
也有以下的写法:
with
a as
(select count(distinct requester_id, accepter_id) as accepts
from request_accepted),
b as
(select count(distinct sender_id, send_to_id) as total
from friend_request)
select round(ifnull(accepts/total,0),2) as accept_rate from a, b;
# a和b作为两个table存在,分别数两个表中唯一存在的(sender_id,send_to_id)和(requester_id,accepter_id)的出现次数
select
round(
ifnull(
(select count(*) from (select distinct requester_id, accepter_id from request_accepted) as A)
/
(select count(*) from (select distinct sender_id, send_to_id from friend_request) as B),
0)
, 2) as accept_rate;
那么面试官看你做到这里微微一笑,你此时意识到事情并没这么简单,第一个 follow up 来了!
Follow up Question 1
你能找出每个月对应的accept_rate
吗?
你想了一想那不就group by
一下就完事了吗?
在刚刚的代码基础上:
select
ifnull(round(b.c / a.c,2),0) as accept_rate, a.month
from
(select
count(distinct sender_id, send_to_id) as c,
month(accept_date) as 'month' # month函数可以将日期转换成月份
from
friend_request
group by 2) a, # 这里group by是按每个月来数次数
(select
count(distinct requester_id, accepter_id) as c, month(accept_date)
from
request_accepted
group by 2) b;
where a.month = b.month # 这里默认了send和request在同一个月发生,否则分母为0
但是可能出现假如只有一个人 A 给 B 在5月份发了邀请,但是 B 在 6 月份接受了邀请,那么这两个月的接受率都应该为 0;前者因为在5月没有人接受,后者因为6月没人发邀请。
那么我们改进后:
with date_table
as (SELECT month(request_date) AS 'month' FROM friend_request
UNION
SELECT month(accept_date) FROM request_accepted
ORDER BY month) # union 函数去除了重复值,return两列合并成一列的所有不重复月份
select ifnull(accept_rate,0), month from
date_table
left join # left join为了找到上述accept_rate可能为0的情况
(select
ifnull(round(b.c / a.c,2),0) as accept_rate, a.month
from
(select
count(distinct sender_id, send_to_id) as c,
month(accept_date) as 'month' # month函数可以将日期转换成月份
from
friend_request
group by 2) a, # 这里group by是按每个月来数次数
(select
count(distinct requester_id, accepter_id) as c, month(accept_date)
from
request_accepted
group by 2) b;
where a.month = b.month) tmp # 这里默认了send和request在同一个月发生,否则分母为0
using (month)
我们用以下数据实验一下:
{"headers":{"friend_request":["sender_id","send_to_id","request_date"],"request_accepted":["requester_id","accepter_id","accept_date"]},"rows":{"friend_request":[[1,2,"2016/05/01"],[1,3,"2016/06/01"],[1,4,"2016/06/01"],[2,3,"2016/07/02"],[3,4,"2016/06/09"]],"request_accepted":[[1,2,"2016/08/03"],[1,3,"2016/06/08"],[2,3,"2016/07/08"],[3,4,"2016/06/09"],[3,4,"2016/06/10"]]}}
Table: friend_request
sender_id | send_to_id | request_date |
---|---|---|
1 | 2 | 2016-05-01 |
1 | 3 | 2016-06-01 |
1 | 4 | 2016-06-01 |
2 | 3 | 2016-07-02 |
3 | 4 | 2016-06-09 |
Table: request_accepted
requester_id | accepter_id | accept_date |
---|---|---|
1 | 2 | 2016-08-03 |
1 | 3 | 2016-06-08 |
2 | 3 | 2016-07-08 |
3 | 4 | 2016-06-09 |
3 | 4 | 2016-06-10 |
红色部分是跟原测试数据不一样,做了改动的,那么我们期待的是:
accept_rate = 0.00
,因为5月只有人发送邀请但没人接受accept_rate = 0.67
,因为有3个人发邀请,2个人接受accept_rate = 1.00
,因为有1个人发邀请,1个人接受accept_rate = 0.00
,因为8月虽然有1个人接受,但是没有人发邀请{"headers": ["ifnull(accept_rate,0)", "month"], "values": [[0.00, 5], [0.67, 6], [1.00, 7], [0.00, 8]]}
奈斯,结果正是我们预想的!
你以为结束了,这时面试官又微微一笑,又问了第二个问题。
Follow up Question 2
那么你能算出每一天的累计accept_rate
吗?
区别在于我们要计算的是累计值
with date_table as
(SELECT request_date AS dates FROM friend_request
UNION
SELECT accept_date FROM request_accepted
ORDER BY dates),
tmp as
(SELECT
coalesce(ROUND(COUNT(DISTINCT requester_id, accepter_id) / COUNT(DISTINCT sender_id, send_to_id), 2),0) as rate,
date_table.dates
FROM
request_accepted b, friend_request a, date_table
where b.accept_date <= date_table.dates
and a.request_date <= date_table.dates
GROUP BY 2)
select ifnull(rate,0) as 'rates', dates
from date_table
left join tmp
using (dates)
这里要注意不能直接把两个表格直接group by
数次数,因为我们要考虑当前时间下,我们的累计接受率,所以要加上条件
where b.accept_date <= date_table.dates
and a.request_date <= date_table.dates
在这样筛选所需的信息之后再根据dates
来group by
。
最后改过后的数据试验一下:
Table: friend_request
sender_id | send_to_id | request_date |
---|---|---|
1 | 2 | 2016-05-01 |
1 | 3 | 2016-06-01 |
1 | 4 | 2016-06-01 |
2 | 3 | 2016-07-02 |
3 | 4 | 2016-06-09 |
Table: request_accepted
requester_id | accepter_id | accept_date |
---|---|---|
1 | 2 | 2016-08-03 |
1 | 3 | 2016-06-08 |
2 | 3 | 2016-07-08 |
3 | 4 | 2016-06-09 |
3 | 4 | 2016-06-10 |
2016-05-01
:accept_rate = 0.00
,因为2016-05-01
只有1人发送邀请但没人接受2016-06-01
:accept_rate = 0.00
,虽然累计有3个人发邀请,但没人接受2016-06-08
:accept_rate = 0.33
,因为有3个人发邀请,1个人接受2016-06-09
:accept_rate = 0.50
,因为有4个人发邀请,2个人接受2016-06-10
:accept_rate = 0.50
,因为有4个人发邀请,2个人接受(2016-06-09
和2016-06-10
重复接受了)2016-07-02
:accept_rate = 0.40
,因为有5个人发邀请,2个人接受2016-07-08
:accept_rate = 0.60
,因为有5个人发邀请,3个人接受2016-08-03
:accept_rate = 0.80
,因为有5个人发邀请,4个人接受{"headers": ["rates", "dates"], "values": [[0.00, "2016-05-01"], [0.00, "2016-06-01"], [0.33, "2016-06-08"], [0.50, "2016-06-09"], [0.50, "2016-06-10"], [0.40, "2016-07-02"], [0.60, "2016-07-08"], [0.80, "2016-08-03"]]}
完美!
Table: Orders
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| order_id | int |
| order_date | date |
| customer_id | int |
| invoice | int |
+---------------+---------+
# order_id is the primary key for this table.
# This table contains information about the orders made by customer_id.
Write an SQL query to find the number of unique orders and the number of unique customers with invoices > $20 for each different month.
Return the result table sorted in any order.
The query result format is in the following example:
# orders
+----------+------------+-------------+------------+
| order_id | order_date | customer_id | invoice |
+----------+------------+-------------+------------+
| 1 | 2020-09-15 | 1 | 30 |
| 2 | 2020-09-17 | 2 | 90 |
| 3 | 2020-10-06 | 3 | 20 |
| 4 | 2020-10-20 | 3 | 21 |
| 5 | 2020-11-10 | 1 | 10 |
| 6 | 2020-11-21 | 2 | 15 |
| 7 | 2020-12-01 | 4 | 55 |
| 8 | 2020-12-03 | 4 | 77 |
| 9 | 2021-01-07 | 3 | 31 |
| 10 | 2021-01-15 | 2 | 20 |
+----------+------------+-------------+------------+
# result
+---------+-------------+----------------+
| month | order_count | customer_count |
+---------+-------------+----------------+
| 2020-09 | 2 | 2 |
| 2020-10 | 1 | 1 |
| 2020-12 | 2 | 1 |
| 2021-01 | 1 | 1 |
+---------+-------------+----------------+
Explanation:
In September 2020 we have two orders from 2 different customers with invoices > $20.
In October 2020 we have two orders from 1 customer, and only one of the two orders has invoice > $20.
In November 2020 we have two orders from 2 different customers but invoices < $20, so we don’t include that month.
In December 2020 we have two orders from 1 customer both with invoices > $20.
In January 2021 we have two orders from 2 different customers, but only one of them with invoice > $20.
这个题是个新题,虽然难度不高,但是对基本的语法和函数都起到了很好的练习作用,在简单类型题里面算是比较全面啦!
正确答案如下:
select
date_format(order_date, '%Y-%m') as 'month',
count(distinct order_id) as 'order_count',
count(distinct customer_id) as 'customer_count'
from orders
where invoice > 20
group by 1
Table: TVProgram
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| program_date | date |
| content_id | int |
| channel | varchar |
+---------------+---------+
(program_date, content_id)
is the primary key for this table.
This table contains information of the programs on the TV.
content_id
is the id of the program in some channel on the TV.
Table: Content
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| content_id | varchar |
| title | varchar |
| Kids_content | enum |
| content_type | varchar |
+------------------+---------+
content_id
is the primary key for this table.
Kids_content
is an enum that takes one of the values (‘Y’, ‘N’) where:
content_type
is the category of the content as movies, series, etc.
Write an SQL query to report the distinct titles of the kid-friendly movies streamed in June 2020.
Return the result table in any order.
The query result format is in the following example.
TVProgram
+--------------------+--------------+-------------+
| program_date | content_id | channel |
+--------------------+--------------+-------------+
| 2020-06-10 08:00 | 1 | LC-Channel |
| 2020-05-11 12:00 | 2 | LC-Channel |
| 2020-05-12 12:00 | 3 | LC-Channel |
| 2020-05-13 14:00 | 4 | Disney Ch |
| 2020-06-18 14:00 | 4 | Disney Ch |
| 2020-07-15 16:00 | 5 | Disney Ch |
+--------------------+--------------+-------------+
Content
+------------+----------------+---------------+---------------+
| content_id | title | Kids_content | content_type |
+------------+----------------+---------------+---------------+
| 1 | Leetcode Movie | N | Movies |
| 2 | Alg. for Kids | Y | Series |
| 3 | Database Sols | N | Series |
| 4 | Aladdin | Y | Movies |
| 5 | Cinderella | Y | Movies |
+------------+----------------+---------------+---------------+
Result
+--------------+
| title |
+--------------+
| Aladdin |
+--------------+
Explaination:
“Leetcode Movie” is not a content for kids.
“Alg. for Kids” is not a movie.
“Database Sols” is not a movie.
“Alladin” is a movie, content for kids and was streamed in June 2020.
“Cinderella” was not streamed in June 2020.
这个题比较常规,考察的是对表格连接以及筛选条件的使用,需要注意的是,我们可以在不select
某一列的情况下,把其用在where
里面作为筛选条件。
正确答案如下:
select distinct title
from TVProgram a
join Content b
on a.content_id = b.content_id
and content_type = 'Movies'
and Kids_content = 'Y'
and program_date like '2020-06%'; # like函数在字符串查找有很多的应用
# and program_date between '2020-06-01' and '2020-06-30'
# and date_format(program_date, '%Y-%m') = '2020-06'
当然,写法不唯一,我们还有:
select distinct title
from TVProgram
join Content
using (content_id)
where content_type = 'Movies'
and Kids_content = 'Y'
and program_date like '2020-06%';
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date)
is the primary key of this table.
This table shows the activity of players of some game.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.
Write an SQL query that reports the first login date for each player.
The query result format is in the following example:
# Activity table
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
# Result table
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
我们要找每个人的第一次登录日期,那么自然而然我们想到用 window function
,rank
或者dense_rank
函数都是可以的,我们使用rank
函数,这里注意我们用了partition by
这一语句(有点像group by
),之前我们仅仅用了order by
这一语句,稍微说一下他们的作用:
PARTITION BY
clause divides the rows of the result set partitions to which the function is applied.ORDER BY
clause specifies the logical sort order of the rows in each a partition to which the function is applied.强调一点,window function
不能作为筛选条件直接使用到where
等环境中,例如下面这种:
select
player_id,
event_date as first_login
from
activity
where rank() over (partition by player_id order by event_date desc) = 1
这种写法会返回:
You cannot use the window function ‘rank’ in this context.
所以我们可以用 subquery 对把 rank
函数所形成的这一列所对应的别名(alias)进行筛选。
正确答案如下:
select
player_id,
event_date as first_login
from
(select
player_id,
event_date,
rank() over (partition by player_id order by event_date) as rnk
from activity) tmp
where rnk = 1
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date)
is the primary key of this table.
This table shows the activity of players of some game.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.
Write a SQL query that reports the device that is first logged in for each player.
The query result format is in the following example:
# Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
# Result table:
+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
+-----------+-----------+
这个题和上一题基本是一样的思路,都是使用了 window function,所以直接上答案了!
正确答案如下:
select
player_id, device_id
from
(select
player_id,
device_id,
rank() over (partition by player_id order by event_date asc) as rnk
from activity) t
where rnk = 1
Table Activities
:
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| sell_date | date |
| product | varchar |
+-------------+---------+
There is no primary key for this table, it may contains duplicates.
Each row of this table contains the product name and the date it was sold in a market.
Write an SQL query to find for each date, the number of distinct products sold and their names.
The sold-products names for each date should be sorted lexicographically.
Return the result table ordered by sell_date
.
The query result format is in the following example.
Activities table:
+------------+-------------+
| sell_date | product |
+------------+-------------+
| 2020-05-30 | Headphone |
| 2020-06-01 | Pencil |
| 2020-06-02 | Mask |
| 2020-05-30 | Basketball |
| 2020-06-01 | Bible |
| 2020-06-02 | Mask |
| 2020-05-30 | T-Shirt |
+------------+-------------+
Result table:
+------------+----------+------------------------------+
| sell_date | num_sold | products |
+------------+----------+------------------------------+
| 2020-05-30 | 3 | Basketball,Headphone,T-shirt |
| 2020-06-01 | 2 | Bible,Pencil |
| 2020-06-02 | 1 | Mask |
+------------+----------+------------------------------+
For 2020-05-30, Sold items were (Headphone, Basketball, T-shirt), we sort them lexicographically and separate them by comma.
For 2020-06-01, Sold items were (Pencil, Bible), we sort them lexicographically and separate them by comma.
For 2020-06-02, Sold item is (Mask), we just return it.
这里我要向大家介绍一个新的函数,就是 group_concat
。这个函数是一个 aggregation
函数(聚合函数),是可以配合 group by
一起使用的,我们在这道题里不光要使用 count
函数,也要使用 group_concat
来将组中的字符串连接成单个字符串。
语法如下:
GROUP_CONCAT(
DISTINCT expression
ORDER BY expression
SEPARATOR sep );
Explanation:
The DISTINCT
clause allows you to eliminate duplicate values in the group before concatenating them.
The ORDER BY
clause allows you to sort the values in ascending or descending order before concatenating. By default, it sorts the values in ascending order. If you want to sort the values in the descending order, you need to specify explicitly the DESC
option.
The SEPARATOR
specifies a literal value inserted between values in the group. If you do not specify a separator, the GROUP_CONCAT
function uses a comma (,) as the default separator.
The GROUP_CONCAT
function ignores NULL
values. It returns NULL
if there was no matching row found or all arguments are NULL
values.
The GROUP_CONCAT
function returns a binary or non-binary string, which depends on the arguments. by default, the maximum length of the return string is 1024. In case you need more than this, you can extend the maximum length by setting the group_concat_max_len
system variable at SESSION
or GLOBAL
level.
那么这道题我们只需要使用 count
和 group_concat
在加上 group by
就可以解决了!
正确答案如下:
select
sell_date,
count(distinct product) as num_sold,
group_concat(distinct product order by product) as products
# group_concat取唯一的product,并且按product排序
from activities
group by sell_date
Table: Delivery
+-----------------------------+---------+
| Column Name | Type |
+-----------------------------+---------+
| delivery_id | int |
| customer_id | int |
| order_date | date |
| customer_pref_delivery_date | date |
+-----------------------------+---------+
delivery_id
is the primary key of this table.
The table holds information about food delivery to customers that make orders at some date and specify a preferred delivery date (on the same order date or after it).
If the preferred delivery date of the customer is the same as the order date then the order is called immediate otherwise it’s called scheduled.
Write an SQL query to find the percentage of immediate orders in the table, rounded to 2 decimal places.
The query result format is in the following example:
# Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 5 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-11 |
| 4 | 3 | 2019-08-24 | 2019-08-26 |
| 5 | 4 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
+-------------+-------------+------------+-----------------------------+
# Result table:
+----------------------+
| immediate_percentage |
+----------------------+
| 33.33 |
+----------------------+
The orders with delivery id 2
and 3
are immediate while the others are scheduled.
这个题和第二题很相似,回顾一下第二题,我们通过遵循某个限定条件来计算某个值,这一类的题都很类似,在这里,我们要计算 immediate order 占所有 order 的比例,那么 immediate 的定义是什么呢?很简单,就是当 order_date
和 customer_pref_delivery_date
相等的时候对应的行。
那么思路就是把这些行数出来一共多少行,再除以总的行数,所以我们需要一种 if
函数来筛选并计数。
正确答案如下:
select
round( # round函数取小数点后两位
100* # 100*转化成百分数
sum(case when order_date = customer_pref_delivery_date then 1 else 0 end)
# case when函数当条件成立时候,对目标进行操作(赋值、转换),注意end结尾!
# 这里我们一旦满足条件,次数就加1,sum起来就查出了一共满足条件有多少个
/ count(*)
,2) as immediate_percentage
from delivery
也可写成:
select
round( # round函数取小数点后两位
100* # 100*转化成百分数
sum(if(order_date = customer_pref_delivery_date, 1, 0))
# if 和 case when 差不多,看个人喜好
# 我喜欢用if,因为不用担心最后没加end
/ count(*)
,2) as immediate_percentage
from delivery
Table: Products
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| product_id | int |
| product_name | varchar |
| product_category | varchar |
+------------------+---------+
product_id
is the primary key for this table.
This table contains data about the company’s products.
Table: Orders
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| product_id | int |
| order_date | date |
| unit | int |
+---------------+---------+
There is no primary key for this table. It may have duplicate rows.
product_id
is a foreign key to Products table.
unit
is the number of products ordered in order_date
.
Write an SQL query to get the names of products with greater than or equal to 100 units ordered in February 2020 and their amount.
Return result table in any order.
The query result format is in the following example:
# Products table:
+-------------+-----------------------+------------------+
| product_id | product_name | product_category |
+-------------+-----------------------+------------------+
| 1 | Leetcode Solutions | Book |
| 2 | Jewels of Stringology | Book |
| 3 | HP | Laptop |
| 4 | Lenovo | Laptop |
| 5 | Leetcode Kit | T-shirt |
+-------------+-----------------------+------------------+
# Orders table:
+--------------+--------------+----------+
| product_id | order_date | unit |
+--------------+--------------+----------+
| 1 | 2020-02-05 | 60 |
| 1 | 2020-02-10 | 70 |
| 2 | 2020-01-18 | 30 |
| 2 | 2020-02-11 | 80 |
| 3 | 2020-02-17 | 2 |
| 3 | 2020-02-24 | 3 |
| 4 | 2020-03-01 | 20 |
| 4 | 2020-03-04 | 30 |
| 4 | 2020-03-04 | 60 |
| 5 | 2020-02-25 | 50 |
| 5 | 2020-02-27 | 50 |
| 5 | 2020-03-01 | 50 |
+--------------+--------------+----------+
# Result table:
+--------------------+---------+
| product_name | unit |
+--------------------+---------+
| Leetcode Solutions | 130 |
| Leetcode Kit | 100 |
+--------------------+---------+
Explanation:
Products with product_id = 1
is ordered in February a total of (60 + 70) = 130
.
Products with product_id = 2
is ordered in February a total of 80
.
Products with product_id = 3
is ordered in February a total of (2 + 3) = 5
.
Products with product_id = 4
was not ordered in February 2020.
Products with product_id = 5
is ordered in February a total of (50 + 50) = 100
.
这道题不难,但是需要慢慢一步步的拆解,稍微复杂一点,细心一点肯定可以做出来,刷 sql 其实并不是要刷多难的题,而是在你拿到题的那一刻,你能快速的反应,找到解题思路,并有逻辑的表述出来。
我们要找到所有同时满足下列条件的产品名称和其数量
正确答案如下:
select product_name, sum(unit) as 'unit'
from products p
join orders o
using (product_id)
where order_date like '2020-02%'
group by 1
having sum(unit) >= 100
其实顺着思路走,还是比较简单的,先连接表格,在筛选时间,最后 group by
加 having
筛选 sum(unit) >= 100
的信息,结束!
X city opened a new cinema, many people would like to go to this cinema. The cinema also gives out a poster indicating the movies’ ratings and descriptions.
Please write a SQL query to output movies with an odd numbered ID and a description that is not ‘boring’. Order the result by rating.
For example, table cinema
:
+---------+-----------+--------------+-----------+
| id | movie | description | rating |
+---------+-----------+--------------+-----------+
| 1 | War | great 3D | 8.9 |
| 2 | Science | fiction | 8.5 |
| 3 | irish | boring | 6.2 |
| 4 | Ice song | Fantacy | 8.6 |
| 5 | House card| Interesting| 9.1 |
+---------+-----------+--------------+-----------+
For the example above, the output should be:
+---------+-----------+--------------+-----------+
| id | movie | description | rating |
+---------+-----------+--------------+-----------+
| 5 | House card| Interesting| 9.1 |
| 1 | War | great 3D | 8.9 |
+---------+-----------+--------------+-----------+
这个题比较简单,用的都是基本的语法,其中筛选奇数的id
我们使用了mod
函数。
正确答案如下:
select *
from cinema
where description != 'boring'
and mod(id,2) = 1
order by rating desc;
Table: Project
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| project_id | int |
| employee_id | int |
+-------------+---------+
(project_id, employee_id)
is the primary key of this table.
employee_id
is a foreign key to Employee
table.
Table: Employee
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| employee_id | int |
| name | varchar |
| experience_years | int |
+------------------+---------+
employee_id
is the primary key of this table.
Write an SQL query that reports all the projects that have the most employees.
The query result format is in the following example:
# Project table
+-------------+-------------+
| project_id | employee_id |
+-------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 4 |
+-------------+-------------+
# Employee table
+-------------+--------+------------------+
| employee_id | name | experience_years |
+-------------+--------+------------------+
| 1 | Khaled | 3 |
| 2 | Ali | 2 |
| 3 | John | 1 |
| 4 | Doe | 2 |
+-------------+--------+------------------+
# result table
+-------------+
| project_id |
+-------------+
| 1 |
+-------------+
# The first project has 3 employees while the second one has 2.
这种类型的题已经做过很多次了,还是使用 window function 来解决,注意使用 subquery 加上 where 条件来解。
正确答案如下:
select
project_id
from
(select
project_id,
dense_rank() over (order by count(*) desc) as rnk
# 一定要记住 rank / dense_rank 的语法!
from project
group by project_id) as t
where rnk = 1
Table: Visits
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| visit_id | int |
| customer_id | int |
+-------------+---------+
visit_id
is the primary key for this table.
This table contains information about the customers who visited the mall.
Table: Transactions
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| transaction_id | int |
| visit_id | int |
| amount | int |
+----------------+---------+
transaction_id
is the primary key for this table.
This table contains information about the customers who visited the mall.
Write an SQL query to find the IDs of the users who visited without making any transactions and the number of times they made these types of visits.
Return the result table sorted in any order.
The query result format is in the following example:
# Visits
+----------+-------------+
| visit_id | customer_id |
+----------+-------------+
| 1 | 23 |
| 2 | 9 |
| 4 | 30 |
| 5 | 54 |
| 6 | 96 |
| 7 | 54 |
| 8 | 54 |
+----------+-------------+
# Transactions
+----------------+----------+--------+
| transaction_id | visit_id | amount |
+----------------+----------+--------+
| 2 | 5 | 310 |
| 3 | 5 | 300 |
| 9 | 5 | 200 |
| 12 | 1 | 910 |
| 13 | 2 | 970 |
+----------------+----------+--------+
# Result table
+-------------+----------------+
| customer_id | count_no_trans |
+-------------+----------------+
| 54 | 2 |
| 30 | 1 |
| 96 | 1 |
+-------------+----------------+
Explainantion:
Customer with id = 23 visited the mall once and made one transaction during the visit with id = 12.
Customer with id = 9 visited the mall once and made one transaction during the visit with id = 13.
Customer with id = 30 visited the mall once and did not make any transactions.
Customer with id = 54 visited the mall three times. During 2 visits they did not make any transactions, and during one visit they made 3 transactions.
Customer with id = 96 visited the mall once and did not make any transactions.
As we can see, users with IDs 30 and 96 visited the mall one time without making any transactions. Also user 54 visited the mall twice and did not make any transactions.
本题是新题,但是比较简单,不过多讲解了,不过要注意一下几点:
正确答案如下:
select
customer_id,
count(*) as count_no_trans
from visits v
left join transactions t
using (visit_id)
where t.visit_id is null
# 这里要注意是筛选哪一个表中的visit_id为null
group by customer_id
Table: Queries
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| query_name | varchar |
| result | varchar |
| position | int |
| rating | int |
+-------------+---------+
There is no primary key for this table, it may have duplicate rows.
This table contains information collected from some queries on a database.
The position
column has a value from 1 to 500.
The rating
column has a value from 1 to 5. Query with rating
less than 3 is a poor query.
We define query quality
as:
The average of the ratio between query rating and its position.
We also define poor query percentage
as:
The percentage of all queries with rating less than 3.
Write an SQL query to find each query_name
, the quality
and poor_query_percentage
.
Both quality
and poor_query_percentage
should be rounded to 2 decimal places.
The query result format is in the following example:
# queries table
+------------+-------------------+----------+--------+
| query_name | result | position | rating |
+------------+-------------------+----------+--------+
| Dog | Golden Retriever | 1 | 5 |
| Dog | German Shepherd | 2 | 5 |
| Dog | Mule | 200 | 1 |
| Cat | Shirazi | 5 | 2 |
| Cat | Siamese | 3 | 3 |
| Cat | Sphynx | 7 | 4 |
+------------+-------------------+----------+--------+
# Result table
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog | 2.50 | 33.33 |
| Cat | 0.66 | 33.33 |
+------------+---------+-----------------------+
Explaination:
Dog queries quality is ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50
Dog queries poor_ query_percentage is (1 / 3) * 100 = 33.33
Cat queries quality equals ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66
Cat queries poor_ query_percentage is (1 / 3) * 100 = 33.33
根据描述,我们知道要去求出每个query_name
对应的rating/position
的平均值,并命名为quality
,再求出每个query_name
对应的rating < 3
的比率。
正确答案如下:
select
query_name,
round(AVG(rating/position),2) as quality,
round(AVG(if(rating < 3, 1, 0)*100),2) as poor_query_percentage
# if 也可以用 case when
from
queries
group by query_name
Table: Department
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| revenue | int |
| month | varchar |
+---------------+---------+
(id, month)
is the primary key of this table.
The table has information about the revenue of each department per month.
The month has values in ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
.
Write an SQL query to reformat the table such that there is a department id column and a revenue column for each month.
The query result format is in the following example:
# Department table:
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+------+---------+-------+
# Result table:
+------+-------------+-------------+-------------+-----+-------------+
| id | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-------------+
| 1 | 8000 | 7000 | 6000 | ... | null |
| 2 | 9000 | null | null | ... | null |
| 3 | null | 10000 | null | ... | null |
+------+-------------+-------------+-------------+-----+-------------+
Note that the result table has 13 columns (1 for the department id + 12 for the months).
这个题比较新颖,并不是简单的筛选条件,而是对表格做转换,类似于 pivot table。
那么我们要做的就是针对每个 id,对应某个月份,找到对应的 revenue。
正确答案如下:
select id, sum(case when month = 'Jan' then revenue else NULL end) as Jan_Revenue ,
sum(case when month = 'Feb' then revenue else NULL end) as Feb_Revenue ,
sum(case when month = 'Mar' then revenue else NULL end) as Mar_Revenue ,
sum(case when month = 'Apr' then revenue else NULL end) as Apr_Revenue ,
sum(case when month = 'May' then revenue else NULL end) as May_Revenue ,
sum(case when month = 'Jun' then revenue else NULL end) as Jun_Revenue ,
sum(case when month = 'Jul' then revenue else NULL end) as Jul_Revenue ,
sum(case when month = 'Aug' then revenue else NULL end) as Aug_Revenue ,
sum(case when month = 'Sep' then revenue else NULL end) as Sep_Revenue ,
sum(case when month = 'Oct' then revenue else NULL end) as Oct_Revenue ,
sum(case when month = 'Nov' then revenue else NULL end) as Nov_Revenue ,
sum(case when month = 'Dec' then revenue else NULL end) as Dec_Revenue
from Department
group by id
注意这里的 sum 其实可以换成其他的 aggregation 函数,因为此时我们只是为了使用 group by 函数。
此时,每个月每个 id 对应的 revenue 是唯一的,所以加上任何的 aggregation 函数都不会影响结果,当然这里的 case when 可以换成 if 函数。
Table: Weather
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| recordDate | date |
| temperature | int |
+---------------+---------+
id
is the primary key for this table.
This table contains information about the temperature in a certain day.
Write an SQL query to find all dates’ id
with higher temperature compared to its previous dates (yesterday).
Return the result table in any order.
The query result format is in the following example:
# Weather
+----+------------+-------------+
| id | recordDate | Temperature |
+----+------------+-------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+----+------------+-------------+
# Result table:
+----+
| id |
+----+
| 2 |
| 4 |
+----+
In 2015-01-02, temperature was higher than the previous day (10 -> 25).
In 2015-01-04, temperature was higher than the previous day (30 -> 20).
这个题目有点难度,在于以下两点:
其实我们已经接触并熟悉了 self join 的使用了,作用就是自己跟自己进行比对,再根据一定的条件进行筛选。
正确答案如下:
select
tb2.Id # 在使用self join时,我们一定要注意取的是哪个表里的数据
from
Weather tb1
join
Weather tb2
on datediff(tb2.RecordDate, tb1.RecordDate) = 1
# The DATEDIFF() function returns the difference between two dates.
# 请参考:https://www.w3schools.com/sql/func_sqlserver_datediff.asp
and tb2.Temperature > tb1.Temperature
Table: Product
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| product_id | int |
| product_name | varchar |
| unit_price | int |
+--------------+---------+
product_id
is the primary key of this table.
Table: Sales
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| seller_id | int |
| product_id | int |
| buyer_id | int |
| sale_date | date |
| quantity | int |
| price | int |
+------ ------+---------+
This table has no primary key, it can have repeated rows.
product_id
is a foreign key to Product table.
Write an SQL query that reports the best seller by total sales price, If there is a tie, report them all.
The query result format is in the following example:
# Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1 | S8 | 1000 |
| 2 | G4 | 800 |
| 3 | iPhone | 1400 |
+------------+--------------+------------+
# Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1 | 1 | 1 | 2019-01-21 | 2 | 2000 |
| 1 | 2 | 2 | 2019-02-17 | 1 | 800 |
| 2 | 2 | 3 | 2019-06-02 | 1 | 800 |
| 3 | 3 | 4 | 2019-05-13 | 2 | 2800 |
+-----------+------------+----------+------------+----------+-------+
# Result table:
+-------------+
| seller_id |
+-------------+
| 1 |
| 3 |
+-------------+
Both sellers with id 1 and 3 sold products with the most total price of 2800.
通过解释,我们知道要找每个 seller_id
对应的总销售额,并筛选出总销售额最大的 seller_id
,如果有一样的,就都筛选出来。
所以我们需要使用:
正确答案如下:
select seller_id from
(
select
seller_id,
rank() over (order by sum(price) desc) as rnk
# 注意 rank 函数语法,以及在subquery中的使用
from
product
join
sales
using (product_id)
group by 1
) tmp
where rnk = 1;
Table: Product
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| product_id | int |
| product_name | varchar |
| unit_price | int |
+--------------+---------+
product_id
is the primary key of this table.
Table: Sales
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| seller_id | int |
| product_id | int |
| buyer_id | int |
| sale_date | date |
| quantity | int |
| price | int |
+------ ------+---------+
This table has no primary key, it can have repeated rows.
product_id
is a foreign key to Product table.
Write an SQL query that reports the buyers who have bought S8 but not iPhone. Note that S8 and iPhone are products present in the Product
table.
The query result format is in the following example:
# Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1 | S8 | 1000 |
| 2 | G4 | 800 |
| 3 | iPhone | 1400 |
+------------+--------------+------------+
# Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1 | 1 | 1 | 2019-01-21 | 2 | 2000 |
| 1 | 2 | 2 | 2019-02-17 | 1 | 800 |
| 2 | 1 | 3 | 2019-06-02 | 1 | 800 |
| 3 | 3 | 3 | 2019-05-13 | 2 | 2800 |
+-----------+------------+----------+------------+----------+-------+
# Result table:
+-------------+
| buyer_id |
+-------------+
| 1 |
+-------------+
The buyer with id 1 bought an S8 but didn’t buy an iPhone. The buyer with id 3 bought both.
通过解释,我们要找到所有的 buyer_id
同时满足:
那么我们可以找出每个 buyer_id
对应的 S8 和 iPhone 的购买量分别是多少,并满足购买 S8 的数量大于 0,并且购买 iPhone 的数量等于 0。
正确答案如下:
select buyer_id from
(
select
buyer_id,
sum(if(product_name = 'S8', 1, 0)) as 's8',
sum(if(product_name = 'iPhone',1,0)) as 'iPhone'
# 以上两个sum if语句计算出每个buyer_id对应购买两种产品的数量
from product
join sales
using (product_id)
group by 1
) tmp
where s8 > 0
and iPhone = 0
Table: Product
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| product_id | int |
| product_name | varchar |
| unit_price | int |
+--------------+---------+
product_id
is the primary key of this table.
Table: Sales
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| seller_id | int |
| product_id | int |
| buyer_id | int |
| sale_date | date |
| quantity | int |
| price | int |
+------ ------+---------+
This table has no primary key, it can have repeated rows.
product_id
is a foreign key to Product table.
Write an SQL query that reports the products that were only sold in spring 2019. That is, between 2019-01-01 and 2019-03-31 inclusive.
The query result format is in the following example:
# Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1 | S8 | 1000 |
| 2 | G4 | 800 |
| 3 | iPhone | 1400 |
+------------+--------------+------------+
# Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1 | 1 | 1 | 2019-01-21 | 2 | 2000 |
| 1 | 2 | 2 | 2019-02-17 | 1 | 800 |
| 2 | 2 | 3 | 2019-06-02 | 1 | 800 |
| 3 | 3 | 4 | 2019-05-13 | 2 | 2800 |
+-----------+------------+----------+------------+----------+-------+
# Result table:
+-------------+--------------+
| product_id | product_name |
+-------------+--------------+
| 1 | S8 |
+-------------+--------------+
The product with id 1 was only sold in spring 2019 while the other two were sold after.
通过解释,我们要找到只在某一时间段卖出的商品信息,很简单,直接上答案!
TOTALLY WRONG!!!
我们要找的是仅在某一段时间出售的产品,会存在一个产品确实在此时间段出售,但也在别的时间段出售,我们要排除掉!
所以,我们用排除法,使得 product_id
不在 2019 年的夏秋冬所对应的的 product_id
,也就是不在 2019-04-01
到 2019-12-31
。
正确答案如下:
select
product_id,
product_name
from
product
where product_id not in
(select product_id
from sales
where sale_date not between '2019-01-01' and '2019-03-31')
# 注意不能只单纯限制时间范围
# 通过subquery来限制product_id不在
# 2019-04-01 - 2019-12-31对应的product_id
Description
Given three tables: salesperson
, company
, orders
.
Output all the names in the table salesperson
, who didn’t have sales to company ‘RED’.
Example
Input
Table: salesperson
+----------+------+--------+-----------------+-----------+
| sales_id | name | salary | commission_rate | hire_date |
+----------+------+--------+-----------------+-----------+
| 1 | John | 100000 | 6 | 4/1/2006 |
| 2 | Amy | 120000 | 5 | 5/1/2010 |
| 3 | Mark | 65000 | 12 | 12/25/2008|
| 4 | Pam | 25000 | 25 | 1/1/2005 |
| 5 | Alex | 50000 | 10 | 2/3/2007 |
+----------+------+--------+-----------------+-----------+
The table salesperson
holds the salesperson information. Every salesperson has a sales_id and a name.
Table: company
+---------+--------+------------+
| com_id | name | city |
+---------+--------+------------+
| 1 | RED | Boston |
| 2 | ORANGE | New York |
| 3 | YELLOW | Boston |
| 4 | GREEN | Austin |
+---------+--------+------------+
The table company
holds the company information. Every company has a com_id and a name.
Table: orders
+----------+------------+---------+----------+--------+
| order_id | order_date | com_id | sales_id | amount |
+----------+------------+---------+----------+--------+
| 1 | 1/1/2014 | 3 | 4 | 100000 |
| 2 | 2/1/2014 | 4 | 5 | 5000 |
| 3 | 3/1/2014 | 1 | 1 | 50000 |
| 4 | 4/1/2014 | 1 | 4 | 25000 |
+----------+----------+---------+----------+--------+
The table orders
holds the sales record information, salesperson and customer company are represented by sales_id and com_id.
output
+------+
| name |
+------+
| Amy |
| Mark |
| Alex |
+------+
Explanation
According to order ‘3’ and ‘4’ in table orders
, it is easy to tell only salesperson ‘John’ and ‘Pam’ have sales to company ‘RED’,
so we need to output all the other names in the table salesperson
.
这道题看着比较复杂,有三个表格需要去处理,我们可以利用 salesperson
表格里的 sales_id
作为筛选条件,找到所有的 sales_id
不在 company
和 orders
合并后的 company.name = 'RED'
所对应的 sales_id
。
正确答案如下:
select name
from salesperson
where sales_id not in
(select sales_id
from orders
join company
using (com_id)
where company.name = 'RED')
Write a SQL query to get the second highest salary from the Employee
table.
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
For example, given the above Employee table, the query should return 200
as the second highest salary. If there is no second highest salary, then the query should return null
.
+---------------------+
| SecondHighestSalary |
+---------------------+
| 200 |
+---------------------+
以下是官方答案:
LIMIT
clause [Accepted]Algorithm
Sort the distinct salary in descend order and then utilize the LIMIT
clause to get the second highest salary.
SELECT DISTINCT
Salary AS SecondHighestSalary
FROM
Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1
However, this solution will be judged as ‘Wrong Answer’ if there is no such second highest salary since there might be only one record in this table. To overcome this issue, we can take this as a temp table.
MySQL
SELECT
(SELECT DISTINCT
Salary
FROM
Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1) AS SecondHighestSalary
;
IFNULL
and LIMIT
clause [Accepted]Another way to solve the ‘NULL’ problem is to use IFNULL
funtion as below.
MySQL
SELECT
IFNULL(
(SELECT DISTINCT Salary
FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1),
NULL) AS SecondHighestSalary
其他答案
select max(salary) as SecondHighestSalary
from employee
where salary < (select max(salary) from employee);
# 利用subquery来先筛出最高的salary,那么小于最高的所有中的
# 最高的就是第二高的salary了!
# 这样也保证了如果没有第二高存在时返回null的情况
错误答案
select salary as `SecondHighestSalary` from
(select
salary,
dense_rank() over (order by salary desc) as rnk
from employee) tmp
where rnk = 2
# 这种写法不能再没有第二高salary时返回null
Table point
holds the x coordinate of some points on x-axis in a plane, which are all integers.
Write a query to find the shortest distance between two points in these points.
x |
---|
-1 |
0 |
2 |
The shortest distance is ‘1’ obviously, which is from point ‘-1’ to ‘0’. So the output is as below:
shortest |
---|
1 |
Note: Every point is unique, which means there is no duplicates in table point
.
Follow-up: What if all these points have an id and are arranged from the left most to the right most of x axis?
本题依旧考察大家对 self join 的使用,我们只有一个表格,里面包含了点的坐标,那么只有把每个坐标和其他的做比对,来找出最小距离。
正确答案如下:
select min(abs(a.x - b.x)) as shortest
from point a, point b
where a.x != b.x
Follow-up Answer:
select min(b.x - a.x)) as shortest
from point a, point b
where b.x > a.x
limit 1
#since all the x is sorted from smallest to the largest, we only need to consider the adjcent one from the larger side.
Table: Students
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| student_name | varchar |
+---------------+---------+
student_id
is the primary key for this table.
Each row of this table contains the ID and the name of one student in the school.
Table: Subjects
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+
subject_name
is the primary key for this table.
Each row of this table contains the name of one subject in the school.
Table: Examinations
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| student_id | int |
| subject_name | varchar |
+--------------+---------+
There is no primary key for this table. It may contain duplicates.
Each student from the Students table takes every course from Subjects table.
Each row of this table indicates that a student with ID student_id
attended the exam of subject_name
.
Write an SQL query to find the number of times each student attended each exam.
Order the result table by student_id
and subject_name
.
The query result format is in the following example:
# Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |
+------------+--------------+
# Subjects table:
+--------------+
| subject_name |
+--------------+
| Math |
| Physics |
| Programming |
+--------------+
# Examination table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |
+------------+--------------+
# Result table:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1 | Alice | Math | 3 |
| 1 | Alice | Physics | 2 |
| 1 | Alice | Programming | 1 |
| 2 | Bob | Math | 1 |
| 2 | Bob | Physics | 0 |
| 2 | Bob | Programming | 1 |
| 6 | Alex | Math | 0 |
| 6 | Alex | Physics | 0 |
| 6 | Alex | Programming | 0 |
| 13 | John | Math | 1 |
| 13 | John | Physics | 1 |
| 13 | John | Programming | 1 |
+------------+--------------+--------------+----------------+
Explaination:
The result table should contain all students and all subjects.
Alice attended Math exam 3 times, Physics exam 2 times and Programming exam 1 time.
Bob attended Math exam 1 time, Programming exam 1 time and didn’t attend the Physics exam.
Alex didn’t attend any exam.
John attended Math exam 1 time, Physics exam 1 time and Programming exam 1 time.
这个题稍微复杂一些,有三个表格需要处理,但是在我们分析最后想得到的结果时,便可找到切入点。
subject_name
,每个 subject_name
又对应参加的考试场次;这就需要我们用 group by student_id, subject_name
加上 count(subject_name)
来实现student_id = 6
这个同学很特殊,他没有参加哪怕一场考试 。自然我们再连接表格时候要注意了!正确答案如下:
select
a.student_id,
a.student_name,
a.subject_name,
count(b.subject_name) attended_exams
# 这里注意不能换成 count(*) 因为这样是不能数出缺失值的!
from
(select * from Students,Subjects) a
# 这里我们使用 cross join 来把两个表格合并,构造表格中的结构(笛卡尔积)
left join
examinations b
# left join 是因为有学生没考试,但是不能忽略
on a.student_id = b.student_id
and a.subject_name = b.subject_name
# 这里 join 的条件是两个
group by 1,3
# 同理,group by 的条件也是两个,先 student 后 subject_name
order by 1,3
Given a table salary
, such as the one below, that has m=male and f=female values. Swap all f and m values (i.e., change all f values to m and vice versa) with a single update statement and no intermediate temp table.
Note that you must write a single update statement, DO NOT write any select statement for this problem.
Example:
id | name | sex | salary |
---|---|---|---|
1 | A | m | 2500 |
2 | B | f | 1500 |
3 | C | m | 5500 |
4 | D | f | 500 |
After running your update statement, the above salary table should have the following rows:
id | name | sex | salary |
---|---|---|---|
1 | A | f | 2500 |
2 | B | m | 1500 |
3 | C | f | 5500 |
4 | D | m | 500 |
这里要求我们不能用 select 语句,只可以使用 update 来完成,那么这里我们来讲一下 update 的用法!
SQL UPDATE Statement
The UPDATE statement is used to modify the existing records in a table.
update 是用来修改已经存在的信息
基本语法如下;
UPDATE
table_name
SET
column1 = value1,
column2 = value2, ...
WHERE condition;
# 注意一般情况下,一定要使用where
# 否则很可能会把所有的信息全部更新!
那么回到本题来,我们需要交换 sex
中的 f
和 m
。那可以用 if 函数来实现!
正确答案如下:
update salary
set sex = if(sex = 'm','f','m');
# Runtime: 252 ms, faster than 59.34% of MySQL online submissions for Swap Salary.
# Memory Usage: 0B, less than 100.00% of MySQL online submissions for Swap Salary.
但是下面这个要更快一点!
update salary
set sex = case when sex = 'm' then 'f' else 'm' end
# Runtime: 213 ms, faster than 97.22% of MySQL online submissions for Swap Salary.
# Memory Usage: 0B, less than 100.00% of MySQL online submissions for Swap Salary.
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| name | varchar |
+---------------+---------+
id
is the primary key for this table.
name is the name of the user.
Table: Rides
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| user_id | int |
| distance | int |
+---------------+---------+
id
is the primary key for this table.
user_id is the id of the user who travelled the distance “distance”.
Write an SQL query to report the distance travelled by each user.
Return the result table ordered by travelled_distance
in descending order, if two or more users travelled the same distance, order them by their name
in ascending order.
The query result format is in the following example.
# Users table:
+------+-----------+
| id | name |
+------+-----------+
| 1 | Alice |
| 2 | Bob |
| 3 | Alex |
| 4 | Donald |
| 7 | Lee |
| 13 | Jonathan |
| 19 | Elvis |
+------+-----------+
# Rides table:
+------+----------+----------+
| id | user_id | distance |
+------+----------+----------+
| 1 | 1 | 120 |
| 2 | 2 | 317 |
| 3 | 3 | 222 |
| 4 | 7 | 100 |
| 5 | 13 | 312 |
| 6 | 19 | 50 |
| 7 | 7 | 120 |
| 8 | 19 | 400 |
| 9 | 7 | 230 |
+------+----------+----------+
# Result table:
+----------+--------------------+
| name | travelled_distance |
+----------+--------------------+
| Elvis | 450 |
| Lee | 450 |
| Bob | 317 |
| Jonathan | 312 |
| Alex | 222 |
| Alice | 120 |
| Donald | 0 |
+----------+--------------------+
Explaination:
Elvis and Lee travelled 450 miles, Elvis is the top traveller as his name is alphabetically smaller than Lee.
Bob, Jonathan, Alex and Alice have only one ride and we just order them by the total distances of the ride.
Donald didn’t have any rides, the distance travelled by him is 0.
本题比较简单,只需要按部就班去做就可以了!
请注意:
正确答案如下:
select
name,
ifnull(sum(distance),0) as 'travelled_distance'
from
users a
left join
rides b
on a.id = b.user_id
group by 1
order by 2 desc, 1 asc;
A pupil Tim gets homework to identify whether three line segments could possibly form a triangle.
However, this assignment is very heavy because there are hundreds of records to calculate.
Could you help Tim by writing a query to judge whether these three sides can form a triangle, assuming table triangle
holds the length of the three sides x, y and z.
x | y | z |
---|---|---|
13 | 15 | 30 |
10 | 20 | 15 |
For the sample data above, your query should return the follow result:
x | y | z | triangle |
---|---|---|---|
13 | 15 | 30 | No |
10 | 20 | 15 | Yes |
这个题考查我们对三角形的理解,大家都知道三角形成立的条件等价于任意两边之和大于第三边。
所以,我们应该利用条件语句,使 x,y,z
满足条件即可。
正确答案如下:
select
x,y,z,
(case when
x+y>z and x+z>y and y+z>x
then
'Yes' else 'No' end) as 'triangle'
from triangle
Table: Orders
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| order_id | int |
| order_date | date |
| customer_id | int |
| invoice | int |
+---------------+---------+
order_id
is the primary key for this table.
This table contains information about the orders made by customer_id
.
Write an SQL query to find the number of unique orders and the number of unique customers with invoices > $20 for each different month.
Return the result table sorted in any order.
The query result format is in the following example:
# orders
+----------+------------+-------------+------------+
| order_id | order_date | customer_id | invoice |
+----------+------------+-------------+------------+
| 1 | 2020-09-15 | 1 | 30 |
| 2 | 2020-09-17 | 2 | 90 |
| 3 | 2020-10-06 | 3 | 20 |
| 4 | 2020-10-20 | 3 | 21 |
| 5 | 2020-11-10 | 1 | 10 |
| 6 | 2020-11-21 | 2 | 15 |
| 7 | 2020-12-01 | 4 | 55 |
| 8 | 2020-12-03 | 4 | 77 |
| 9 | 2021-01-07 | 3 | 31 |
| 10 | 2021-01-15 | 2 | 20 |
+----------+------------+-------------+------------+
# Result table:
+---------+-------------+----------------+
| month | order_count | customer_count |
+---------+-------------+----------------+
| 2020-09 | 2 | 2 |
| 2020-10 | 1 | 1 |
| 2020-12 | 2 | 1 |
| 2021-01 | 1 | 1 |
+---------+-------------+----------------+
Explaination:
In September 2020 we have two orders from 2 different customers with invoices > $20.
In October 2020 we have two orders from 1 customer, and only one of the two orders has invoice > $20.
In November 2020 we have two orders from 2 different customers but invoices < $20, so we don’t include that month.
In December 2020 we have two orders from 1 customer both with invoices > $20.
In January 2021 we have two orders from 2 different customers, but only one of them with invoice > $20.
这个题看似比较复杂,但是考查的点都比较基础,请注意:
orders
和 customers
需要筛选 unique正确答案如下:
select
date_format(order_date, '%Y-%m') as 'month',
count(distinct order_id) as 'order_count',
count(distinct customer_id) as 'customer_count'
from orders
where invoice > 20
group by 1
Table: Activity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |
+---------------+---------+
There is no primary key for this table, it may have duplicate rows.
The activity_type column is an ENUM of type ('open_session', 'end_session', 'scroll_down', 'send_message')
.
The table shows the user activities for a social media website.
Note that each session belongs to exactly one user.
Write an SQL query to find the daily active user count for a period of 30 days ending 2019-07-27 inclusively. A user was active on some day if he/she made at least one activity on that day.
The query result format is in the following example:
# Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1 | 1 | 2019-07-20 | open_session |
| 1 | 1 | 2019-07-20 | scroll_down |
| 1 | 1 | 2019-07-20 | end_session |
| 2 | 4 | 2019-07-20 | open_session |
| 2 | 4 | 2019-07-21 | send_message |
| 2 | 4 | 2019-07-21 | end_session |
| 3 | 2 | 2019-07-21 | open_session |
| 3 | 2 | 2019-07-21 | send_message |
| 3 | 2 | 2019-07-21 | end_session |
| 4 | 3 | 2019-06-25 | open_session |
| 4 | 3 | 2019-06-25 | end_session |
+---------+------------+---------------+---------------+
# Result table:
+------------+--------------+
| day | active_users |
+------------+--------------+
| 2019-07-20 | 2 |
| 2019-07-21 | 2 |
+------------+--------------+
Note that we do not care about days with zero active users.
正确答案如下:
select
activity_date as day,count(distinct user_id) active_users
from activity
where activity_date between ('2019-07-27' - INTERVAL 29 day)
and '2019-07-27'
group by 1
order by 1
也可以用 CTE (common table expression) 来写
with t as
(select
activity_date, user_id, count(distinct user_id) as counts
from activity
where activity_date between '2019-06-28' and '2019-07-27'
group by activity_date)
select
activity_date as 'day',
counts as 'active_users'
from t
Table: Activity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |
+---------------+---------+
There is no primary key for this table, it may have duplicate rows.
The activity_type
column is an ENUM of type ('open_session', 'end_session', 'scroll_down', 'send_message')
.
The table shows the user activities for a social media website.
Note that each session belongs to exactly one user.
Write an SQL query to find the average number of sessions per user for a period of 30 days ending 2019-07-27 inclusively, rounded to 2 decimal places. The sessions we want to count for a user are those with at least one activity in that time period.
The query result format is in the following example:
Activity table:
# Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1 | 1 | 2019-07-20 | open_session |
| 1 | 1 | 2019-07-20 | scroll_down |
| 1 | 1 | 2019-07-20 | end_session |
| 2 | 4 | 2019-07-20 | open_session |
| 2 | 4 | 2019-07-21 | send_message |
| 2 | 4 | 2019-07-21 | end_session |
| 3 | 2 | 2019-07-21 | open_session |
| 3 | 2 | 2019-07-21 | send_message |
| 3 | 2 | 2019-07-21 | end_session |
| 3 | 5 | 2019-07-21 | open_session |
| 3 | 5 | 2019-07-21 | scroll_down |
| 3 | 5 | 2019-07-21 | end_session |
| 4 | 3 | 2019-06-25 | open_session |
| 4 | 3 | 2019-06-25 | end_session |
+---------+------------+---------------+---------------+
# Result table:
+---------------------------+
| average_sessions_per_user |
+---------------------------+
| 1.33 |
+---------------------------+
User 1 and 2 each had 1 session in the past 30 days while user 3 had 2 sessions so the average is (1 + 1 + 2) / 3 = 1.33.
这个题属实没啥营养,都是很熟悉的操作了,需要注意:
average_sessions_per_user
的计算正确答案如下:
select
ifnull(round(count(distinct session_id)/count(distinct(user_id)),2),0)
as average_sessions_per_user
from activity
where activity_date between '2019-06-28' and '2019-07-27';
Table: Warehouse
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| name | varchar |
| product_id | int |
| units | int |
+--------------+---------+
(name, product_id)
is the primary key for this table.
Each row of this table contains the information of the products in each warehouse.
Table: Products
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| product_id | int |
| product_name | varchar |
| Width | int |
| Length | int |
| Height | int |
+---------------+---------+
product_id
is the primary key for this table.
Each row of this table contains the information about the product dimensions (Width, Lenght and Height) in feets of each product.
Write an SQL query to report, How much cubic feet of volume does the inventory occupy in each warehouse.
Return the result table in any order.
The query result format is in the following example.
# Warehouse table:
+------------+--------------+-------------+
| name | product_id | units |
+------------+--------------+-------------+
| LCHouse1 | 1 | 1 |
| LCHouse1 | 2 | 10 |
| LCHouse1 | 3 | 5 |
| LCHouse2 | 1 | 2 |
| LCHouse2 | 2 | 2 |
| LCHouse3 | 4 | 1 |
+------------+--------------+-------------+
# Products table:
+------------+--------------+------------+----------+-----------+
| product_id | product_name | Width | Length | Height |
+------------+--------------+------------+----------+-----------+
| 1 | LC-TV | 5 | 50 | 40 |
| 2 | LC-KeyChain | 5 | 5 | 5 |
| 3 | LC-Phone | 2 | 10 | 10 |
| 4 | LC-T-Shirt | 4 | 10 | 20 |
+------------+--------------+------------+----------+-----------+
# Result table:
+----------------+------------+
| `warehouse_name` | `volume ` |
+----------------+------------+
| LCHouse1 | 12250 |
| LCHouse2 | 20250 |
| LCHouse3 | 800 |
+----------------+------------+
Explaination:
题看着花里胡哨,其实一步一步来就行了,要注意volume 的计算方法。
正确答案如下:
select
name as 'warehouse_name',
sum(units*(width*length*height)) as volume
from warehouse
join products
using (product_id)
group by 1
Table: Countries
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| country_id | int |
| country_name | varchar |
+---------------+---------+
country_id
is the primary key for this table.
Each row of this table contains the ID and the name of one country.
Table: Weather
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| country_id | int |
| weather_state | varchar |
| day | date |
+---------------+---------+
(country_id, day)
is the primary key for this table.
Each row of this table indicates the weather state in a country for one day.
Write an SQL query to find the type of weather in each country for November 2019.
The type of weather is Cold if the average weather_state
is less than or equal 15, Hot if the average weather_state
is greater than or equal 25 and Warm otherwise.
Return result table in any order.
The query result format is in the following example:
# Countries table:
+------------+--------------+
| country_id | country_name |
+------------+--------------+
| 2 | USA |
| 3 | Australia |
| 7 | Peru |
| 5 | China |
| 8 | Morocco |
| 9 | Spain |
+------------+--------------+
# Weather table:
+------------+---------------+------------+
| country_id | weather_state | day |
+------------+---------------+------------+
| 2 | 15 | 2019-11-01 |
| 2 | 12 | 2019-10-28 |
| 2 | 12 | 2019-10-27 |
| 3 | -2 | 2019-11-10 |
| 3 | 0 | 2019-11-11 |
| 3 | 3 | 2019-11-12 |
| 5 | 16 | 2019-11-07 |
| 5 | 18 | 2019-11-09 |
| 5 | 21 | 2019-11-23 |
| 7 | 25 | 2019-11-28 |
| 7 | 22 | 2019-12-01 |
| 7 | 20 | 2019-12-02 |
| 8 | 25 | 2019-11-05 |
| 8 | 27 | 2019-11-15 |
| 8 | 31 | 2019-11-25 |
| 9 | 7 | 2019-10-23 |
| 9 | 3 | 2019-12-23 |
+------------+---------------+------------+
# Result table:
+--------------+--------------+
| country_name | weather_type |
+--------------+--------------+
| USA | Cold |
| Austraila | Cold |
| Peru | Hot |
| China | Warm |
| Morocco | Hot |
+--------------+--------------+
Explaination:
Average weather_state in USA in November is (15) / 1 = 15 so weather type is Cold.
Average weather_state in Austraila in November is (-2 + 0 + 3) / 3 = 0.333 so weather type is Cold.
Average weather_state in Peru in November is (25) / 1 = 25 so weather type is Hot.
Average weather_state in China in November is (16 + 18 + 21) / 3 = 18.333 so weather type is Warm.
Average weather_state in Morocco in November is (25 + 27 + 31) / 3 = 27.667 so weather type is Hot.
We know nothing about average weather_state in Spain in November so we don’t include it in the result table.
这道题虽长篇大论,但是核心就是以下几点:
cold
、warm
还是 hot
正确答案如下:
select
country_name,
(case
when avg(weather_state) <= 15 then 'Cold'
when avg(weather_state) >= 25 then 'Hot'
else 'Warm' end) as weather_type
from countries
join weather
using (country_id)
where day like '2019-11%'
group by country_id
https://gist.github.com/zg104/e5833f0c1002f35d0de602cf5ba8eae3
那么 SQL 简单部分的刷题就告一段落啦,先恭喜大家完成 SQL 简单类型题目的训练,相信大家对 SQL 的函数、语句以及解题关键点都有了基础的了解。SQL 刷题写对了一次不代表你会了,需要的是解题的熟练度与拿到题目快速想出解题思路的能力,任重而道远!
最后我要对你说: