How to test Python programming skills? Tasks from Yandex


Hackathon at the School of Backend Development

In 2019, we needed to automatically test the ability to write Python code with hundreds of developers. So we selected future students for the Backend Development School. This is not the same as offering to solve a problem on a piece of paper, as in an interview. On the other hand, we also could not reuse the task conditions already prepared for our programming competitions. The fact is that competition to determine the best of the best is one thing, and the selection of specialists with little experience in school is another. We needed tasks, by the solution of which it would be clear whether the developer had the basic skills of writing code and the ability to correctly use memory and time. These are the conditions we made up.

Frequent item

Solution author: Mikhail Shushpanov mishush. The problem was solved by 50% of the participants.

Given an array a of n integers. Write a program that finds the number that is most often found in the array. Time limit: 2 s, memory limit: 256 MB.

Input format
The first line of input contains the number n (1 ≀ n ≀ 300,000).
The second line contains n integers a i (0 ≀ a i ≀ 1 000 000 000).

Output Format
Print the only number x, the largest of the numbers that is most often found in the array a.

Decision
A simple task that can be solved both using a regular dictionary and using the Counter structure from the collections module.

First, you can calculate how many times each element occurs, then select the largest of those that occur the greatest number of times. The solution requires O (n) time and O (n) additional memory.

from collections import Counter

n = int(input()) 
a = [int(i) for i in input().split()] 

counter = Counter(a)

result = a[0]
max_count = counter[result]
for number, count in counter.items():
    if count > max_count or (count == max_count and number > result):
        result = number
        max_count = count

print(result) 

Sequence Element Values

Author: Mikhail Shushpanov. The problem was solved by 20% of the participants.

The sequence f 0 , f 1 , f 2 , ... is given by the recurrence relations f 0 = 0, f 1 = 1, f 2 = 2, f k = f k – 1 + f k – 3 .

Write a program that calculates n elements of this sequence with numbers k 1 , k 2 , ..., k n . Time limit: 1 s, memory limit: 10 MB.

Input format
The first line of input contains an integer n (1 ≀ n ≀ 1000).
The second line contains n non-negative integers (0 ≀ ki ≀ 16000), separated by spaces.

Output Format
Output space-separated values ​​f k 1 , f k 2 , ..., f k n .

Decision
, . .

, fk f0, f1, ..., fk–1. fk, k = max(k1, k2, ..., kn). , fk–1 fk–3, , , .

O(max(k, n)) O(n) .

n = int(input())
indices = [int(i) for i in input().split()]
results = {i: i if i < 3 else None for i in indices}
k = max(indices)
a, b, c = 0, 1, 2
for i in range(3, k + 1):
    a, b, c = b, c, a + c
    if i in results:
        results[i] = c
print(" ".join(str(results[i]) for i in indices)) 

Implement JOIN

Posted by: Dmitry Terrov terrmit. The task was solved by 6% of the participants.

Two tables T1 and T2 are given. Each table is represented by two numeric fields. Table T1 contains fields a and b. Table T2 contains fields a and c.

It is necessary to implement a simplified variation of the SQL JOIN of two tables in the field a and form table T3.

The values ​​of field a in the table can be duplicated, but the number of duplications of each field does not exceed 5. Time limit: 1 s, memory limit: 64 MB.

Input format
The first line of the input contains an integer n1 (1 ≀ n1 ≀ 15000) –– the number of rows in table T1. Next, in n1 lines, two integers are written separated by spaces - the values ​​of the fields ai and bi (1 ≀ ai, bi ≀ 1000) of table T1. The next line contains an integer n2 (1 ≀ n2 ≀ 10000) –– the number of rows in table T2. Next, in n2 lines, two integers are written with a space - the values ​​of the fields aj and cj (1 ≀ aj, cj ≀ 1000) of table T2. The last line is the name of the operation (INNER, LEFT, RIGHT, FULL).

Output Format
In the first line print the number n3 - the number of rows in table T3. In the next n3 lines print the space values ​​ak, bk and ck of the table T3 through a space. If as a result of joining tables the value of a field is missing, print NULL instead of this field. The output order of the rows in table T3 is not important.

Decision
, a . β€” :

for a1, b in t1:
    for a2, c in t2:
        if a1 == a2:
            t3.append((a1, b, c))

. a. -:

for row in t1:
    t1_map[row[0]] = row

LEFT. , , . β€” (a, b, c), β€” (a, b, NULL).

for a, b in t1_map.items():
     if a in t2_map:
         t3.append((a, b, t2[a]))
     else:
         t3.append((a, b, 'NULL'))

RIGHT .

FULL , , :

for a, c in t2_map.items():
    if a not in t1_map:
        t3.append((a, 'NULL', c))

a , hash-map , a. Python itertools.product.

, a , β€” N x N. , . , .

Deleting Null Values

Author: Mikhail Shushpanov. 7% of the participants solved the problem.

Write a program that removes values ​​from the JSON structure that are empty dictionaries ({}) or empty lists ([]), as long as there are such values. If the value in the dictionary is deleted, the corresponding key is also deleted. Time limit: 0.2 s, memory limit: 6 MB.

Input format
A single input line contains a string of length n (1 ≀ n ≀ 1000). It is guaranteed that this string is a valid JSON string.

Output format:
Output, through a space, the number of deleted empty dictionaries and the number of deleted empty lists.

Decision
. , -, , JSON. json, isinstance, , Python. JSON {}, [], : , ''{}, []'' .

β€” , .

import json
dict_counter, list_counter = 0, 0
def clean_struct(struct):
    global dict_counter, list_counter
    if isinstance(struct, dict):
        for key, value in struct.copy().items():
            cleaned_struct = clean_struct(value)
            if cleaned_struct is None:
                del struct[key]
            else:
                struct[key] = cleaned_struct
        if len(struct) == 0:
            dict_counter += 1
            return None
    if isinstance(struct, list):
        i = 0
        while i < len(struct):
            cleaned_struct = clean_struct(struct[i])
            if cleaned_struct is None:
                del struct[i]
            else:
                struct[i] = cleaned_struct
                i += 1
        if len(struct) == 0:
            list_counter += 1
            return None
    return struct
struct = json.loads(input())
clean_struct(struct)
print(dict_counter, list_counter) 

High load

Solution author: Dmitry Terrov. The problem was solved by 14% of participants.

You are given a history of sessions on some fictitious service. Each session is characterized by the start and end times si and fi, for convenience, all these values ​​are different.

Find the point in time t when the largest number of sessions was active. If there are several such moments, then print the earliest of them. Time limit: 1 s, memory limit: 256 MB.

Input format
The first line of input contains an integer n (1 ≀ n ≀ 1000). Next, n lines contain two space-separated integers si and fi (0 ≀ si <fi ≀ 1,000,000,000).

Output format
Print the desired time t.

Decision
, β€” , . β€” O(N x N).

. β€” . . , , , β€” , β€” . β€” O(N x log N).

n = int(input())
sessions = []
for _ in range(n):
    x, y = map(int, input().split())
    sessions.append((x, y))

events = []
for start, end in sessions:
    events.append((start, 1))
    events.append((end, -1))
events.sort()

max_number = min_time = cur_number = 0
for cur_time, operation in events:
    cur_number = cur_number + operation
    if cur_number > max_number:
        max_number = cur_number
        min_time = cur_time

print(min_time)

Series Length Coding

The author of the decision: Mikhail Shushpanov. The problem was solved by 33% of participants

Series Length Coding (RLE) - a data compression algorithm that replaces repeated characters with one character and the number of iterations. A series is a sequence consisting of several identical characters (more than one). When encoding, a string of identical characters that make up a series is replaced by a string containing the repeating character itself and the number of its repeats.

For example, string AAAABBB will be compressed to string A4B3, and string AAAAAAAAAAAAAAAABAAAAA to string A15BA5.

You are given a compressed string, find the length of the original string. The length of the source string does not exceed 1000 characters, all characters of the source string are capital letters of the Latin alphabet. Time limit: 2 s, memory limit: 264 MB.

Input format
A single line of input contains a non-empty line. It is guaranteed that this string is the result of the correct RLE compression of some string.

Output Format
Print the length of the source string.

Decision
.

, . , . β€” . β€” . , .

O(n) O(1) (O(n) ).

first_symbol = True
result = 0
number = ''

for c in input():
    if c.isdigit():
        number += c
    elif c.isalpha() and not number:
        if first_symbol:
            first_symbol = False
        else:
            result += 1
    elif c.isalpha() and number:
        result += int(number)
        number = ''
	
if number:
    result += int(number)
else:
    result += 1
print(result)

DDoS Detector

Author: Sergey Demurin kakty3. The problem was solved by 7% of the participants. A

list of N IP addresses was given. We call an IP address β€œbad” if there are M consecutive lines in which this IP address occurs at least K times. Time limit: 10 s, memory limit: 10 MB.

Input format
The first line contains the number N (1 ≀ N ≀ 10 6 ).
The second line contains the number M (1 ≀ M ≀ 10 3 ).
The third line contains the number K (1 ≀ K ≀ M).
The next N lines contain IP addresses, one per line.

Output Format
List the bad IP addresses in lexicographical order.

Decision
: , M , K . , - .

: β€” M . M. , β‰₯ K. , , .

, β€” . :
β€” ,
β€” «» , K.

, IP- . 106. . , , . 103.

O(N) IP- O(M) .

# coding: utf-8
from collections import Counter, deque
import sys

def main():
    #   N, M  K
    n_of_lines = int(sys.stdin.readline().strip())
    window_size = int(sys.stdin.readline().strip())
    threshold = int(sys.stdin.readline().strip())

    #    «» ,
    #     
    #   
    bad_ips = set()
    last_ips = deque(maxlen=window_size)
    counter = Counter()

    for _ in range(n_of_lines):
        #  IP-
        current_ip = sys.stdin.readline().rstrip()

        # ,   
        if len(last_ips) == window_size:
            #      
            #      
            oldest_ip = last_ips.pop()
            counter[oldest_ip] -= 1

            #      β€”  
            if not counter[oldest_ip]:
                del counter[oldest_ip]

        #     
        last_ips.appendleft(current_ip)

        #        «»,
        #      
        if current_ip in bad_ips:
            continue

        #     
        counter[current_ip] += 1

        #      K,
        #      «»
        if counter[current_ip] >= threshold:
            bad_ips.add(current_ip)
            #  «»   ,
            #     
            del counter[current_ip]

    #  «»    
    print('\n'.join(current_ip for current_ip in sorted(bad_ips)))

if __name__ == "__main__":
    main()

These tasks were the first part of the introductory assignment to the Backend Development School. The second part (cloud service) we will publish and analyze in the coming weeks.

If you are interested in competitions for backders, read the analysis of backend tracks of the first and second programming championships.

All Articles