5 SQL Questions Frequently Asked by Data Scientists for Job Interviews

Although compiling SQL queries is not the most interesting thing for Data Scientists, a good understanding of SQL is extremely important for anyone who wants to succeed in any activity related to data processing. The point here is that SQL is not only SELECT, FROMand WHERE. The more SQL constructions a specialist knows, the easier it will be for him to create requests for obtaining from the databases everything that he may need. The author of the article, the translation of which we are publishing today, says that it is aimed at solving two problems:





  1. Exploring mechanisms that go beyond basic SQL knowledge.
  2. Consideration of several practical tasks for working with SQL.

This article covers 5 SQL questions from Leetcode. They represent practical tasks that are often encountered in interviews.

Question No. 1: second place in terms of salary


Write an SQL query to obtain from the table with employee salary information ( Employee) an entry containing the second highest salary.

For example, such a query executed for the table below should return 200. If the table does not have a value lower than the highest salary, the request should return null.

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

▍ Decision A: use IFNULLandOFFSET


Here are the main mechanisms that will be used in this solution to the problem:

  • IFNULL(expression, alt): this function returns its argument expressionif it is not equal null. Otherwise, the argument is returned alt. We will use this function to return nullif the table does not contain the desired value.
  • OFFSET: This operator is used with an expression ORDER BYto discard the first nlines. This is useful to us for the reason that we are interested in the second row of the result (that is, the second largest salary, the data on which is in the table).

Here is a ready-made request:

SELECT
    IFNULL(
        (SELECT DISTINCT Salary
        FROM Employee
        ORDER BY Salary DESC
        LIMIT 1 OFFSET 1
        ), null) as SecondHighestSalary
FROM Employee
LIMIT 1

▍ Solution B: use MAX


The query below uses the function MAX. Here, the highest salary value is selected, not equal to the maximum salary received throughout the table. As a result, we get what we need - the second largest salary.

SELECT MAX(salary) AS SecondHighestSalary
FROM Employee
WHERE salary != (SELECT MAX(salary) FROM Employee)

Question # 2: duplicate email addresses


Write an SQL query that detects Personall duplicate email addresses in the table .

+----+---------+
| Id | Email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+

▍ Decision A: COUNTin the subquery


First, we create a subquery in which the frequency of occurrence of each address in the table is determined. Then the result returned by the subquery is filtered using the instruction WHERE count > 1. The query will return information about addresses found in the source table more than once.

SELECT Email
FROM (
    SELECT Email, count(Email) AS count
    FROM Person
    GROUP BY Email
) as email_count
WHERE count > 1

▍ Solution B: expression HAVING


  • HAVING: This is an expression that allows you to use an instruction WHEREwith an expression GROUP BY.

SELECT Email
FROM Person
GROUP BY Email
HAVING count(Email) > 1

Question No. 3: rising temperature


Write an SQL query that finds in the table Weatherall dates (date identifiers) when the temperature would be higher than the temperature on the dates preceding it. That is, we are interested in dates on which the “today's” temperature is higher than the “yesterday's”.

+---------+------------------+------------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+---------+------------------+------------------+
|       1 |       2015-01-01 |               10 |
|       2 |       2015-01-02 |               25 |
|       3 |       2015-01-03 |               20 |
|       4 |       2015-01-04 |               30 |
+---------+------------------+------------------+

▍Solution: DATEDIFF


  • DATEDIFF: This function calculates the difference between two dates. It is used to provide a comparison of precisely “today's” and “yesterday's” temperatures.

If we formulate the following query in ordinary language, it turns out that it expresses the following idea: we need to choose identifiers such that the temperature corresponding to the dates they represent is greater than the temperature for the “yesterday's” dates with respect to them.

SELECT DISTINCT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
AND DATEDIFF(a.Recorddate, b.Recorddate) = 1

Question number 4: the highest salary in the unit


The table Employeestores information about company employees. Each record in this table contains information about the identifier ( Id) of the employee, his name ( Name), salary ( Salary), and the division of the company where he works ( Department).

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

The table Departmentcontains information about the divisions of the company.

+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+

Write an SQL query that finds in each of the departments of employees with the maximum salary. For example, for the above tables, a similar query should return the results represented by the following table (the order of the rows in the table does not matter):

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
+------------+----------+--------+

▍Solution: team IN


The command INallows you to set WHEREconditions in the instruction that correspond to the use of several commands OR. For example, the two following constructs are identical:

WHERE country = ‘Canada’ OR country = ‘USA’
WHERE country IN (‘Canada’, ’USA’).

Here we want to get a table containing the name of the department ( Department), the name of the employee ( Employee) and his salary ( Salary). To do this, we create a table that contains information about the unit identifier ( DepartmentID) and the maximum salary for this unit. Then we combine the two on a condition table according to which entry in the resulting table fall only if DepartmentIDand Salaryhave previously formed in the table.

SELECT
    Department.name AS 'Department',
    Employee.name AS 'Employee',
    Salary
FROM Employee
INNER JOIN Department ON Employee.DepartmentId = Department.Id
WHERE (DepartmentId , Salary) 
    IN
    (   SELECT
            DepartmentId, MAX(Salary)
        FROM
            Employee
        GROUP BY DepartmentId
 )

Question No. 5: transplanting students


Mary is a high school teacher. She has a table seatthat stores the names of students and information about their places in the classroom. The value idin this table is constantly increasing. Mary wants to swap neighboring students.

Here is a table of students' initial placement:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

Here's what should happen after transplanting neighboring students:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

Write a request that will allow the teacher to solve the above problem.

Please note that if the number of students is odd, you don’t need to transfer the last student anywhere.

▍Solution: using an operator WHEN


An SQL construct CASE WHEN THENcan be considered an operator ifin programming.

In our case, the first operator is WHENused to check whether the odd identifier is assigned to the last row in the table. If so, the line is not subject to change. The second operator WHENis responsible for adding 1 to each odd identifier (for example, 1, 3, 5 turns into 2, 4, 6) and for subtracting 1 from each even identifier (2, 4, 6 turn into 1, 3, 5).

SELECT 
    CASE 
        WHEN((SELECT MAX(id) FROM seat)%2 = 1) AND id = (SELECT MAX(id) FROM seat) THEN id
        WHEN id%2 = 1 THEN id + 1
        ELSE id - 1
    END AS id, student
FROM seat
ORDER BY id

Summary


We examined several SQL tasks, discussing along the way some advanced tools that can be used to compile SQL queries. We hope that what you learned today will come in handy for you during interviews in SQL and will prove useful in everyday work.

PS In our marketplace there is a Docker image with SQL Server Express, which is installed in one click. You can check the operation of containers on VPS. All new customers are given 3 days free of charge for testing.

Dear readers! What can you advise to those who want to master the art of creating SQL queries?

Source: https://habr.com/ru/post/undefined/


All Articles