▫️ 🗺️ 👩‍🏭 Why, when and how to use multithreading and multiprocessing in Python 🙇 🎸 👏🏽

Salute, Khabrovsk. Right now, OTUS has opened a set for the Machine Learning course , in connection with this we have translated for you one very interesting “fairy tale”. Go.

Once upon a time, in a distant, distant galaxy ... A

wise and powerful wizard lived in a small village in the middle of the desert. And their name was Dumbledalf. He was not only wise and powerful, but also helped people who came from distant lands to ask for help from a wizard. Our story began when one traveler brought the wizard a magic scroll. The traveler did not know what was in the scroll, he only knew that if anyone could reveal all the secrets of the scroll, then this was Dumbledalf.

Chapter 1: Single-Threaded, Single-Process

If you have not guessed yet, I drew an analogy with the processor and its functions. Our wizard is a processor, and a scroll is a list of links that lead to Python's power and knowledge to master it.

The first thought of the wizard who deciphered the list without any difficulty was to send his faithful friend (Garrigorn? I know, I know that sounds awful) to each of the places that were described in the scroll to find and bring what he could find there.

In [1]:
import urllib.request
from concurrent.futures import ThreadPoolExecutor
In [2]:
urls = [
  'http://www.python.org',
  'https://docs.python.org/3/',
  'https://docs.python.org/3/whatsnew/3.7.html',
  'https://docs.python.org/3/tutorial/index.html',
  'https://docs.python.org/3/library/index.html',
  'https://docs.python.org/3/reference/index.html',
  'https://docs.python.org/3/using/index.html',
  'https://docs.python.org/3/howto/index.html',
  'https://docs.python.org/3/installing/index.html',
  'https://docs.python.org/3/distributing/index.html',
  'https://docs.python.org/3/extending/index.html',
  'https://docs.python.org/3/c-api/index.html',
  'https://docs.python.org/3/faq/index.html'
  ]
In [3]:
%%time

results = []
for url in urls:
    with urllib.request.urlopen(url) as src:
        results.append(src)

CPU times: user 135 ms, sys: 283 µs, total: 135 ms
Wall time: 12.3 s
In [ ]:

As you can see, we simply iterate over the URLs one by one using the for loop and read the answer. Thanks to %% time and the magic of IPython , we can see that it took about 12 seconds with my sad internet.

Chapter 2: Multithreading

It was not without reason that the wizard was famous for his wisdom; he was quickly able to come up with a much more effective way. Instead of sending one person to each place in order, why not gather a squad of reliable associates and send them to different parts of the world at the same time! The wizard will be able to unite all the knowledge that they bring at once!

That's right, instead of viewing the list in a loop in sequence, we can use multithreading to access multiple URLs at once.

In [1]:
import urllib.request
from concurrent.futures import ThreadPoolExecutor
In [2]:
urls = [
  'http://www.python.org',
  'https://docs.python.org/3/',
  'https://docs.python.org/3/whatsnew/3.7.html',
  'https://docs.python.org/3/tutorial/index.html',
  'https://docs.python.org/3/library/index.html',
  'https://docs.python.org/3/reference/index.html',
  'https://docs.python.org/3/using/index.html',
  'https://docs.python.org/3/howto/index.html',
  'https://docs.python.org/3/installing/index.html',
  'https://docs.python.org/3/distributing/index.html',
  'https://docs.python.org/3/extending/index.html',
  'https://docs.python.org/3/c-api/index.html',
  'https://docs.python.org/3/faq/index.html'
  ]
In [4]:
%%time

with ThreadPoolExecutor(4) as executor:
    results = executor.map(urllib.request.urlopen, urls)

CPU times: user 122 ms, sys: 8.27 ms, total: 130 ms
Wall time: 3.83 s
In [5]:
%%time

with ThreadPoolExecutor(8) as executor:
    results = executor.map(urllib.request.urlopen, urls)

CPU times: user 122 ms, sys: 14.7 ms, total: 137 ms
Wall time: 1.79 s
In [6]:
%%time

with ThreadPoolExecutor(16) as executor:
    results = executor.map(urllib.request.urlopen, urls)

CPU times: user 143 ms, sys: 3.88 ms, total: 147 ms
Wall time: 1.32 s
In [ ]:

Much better! Almost like ... magic. Using multiple threads can significantly speed up many I / O tasks. In my case, most of the time spent reading URLs is due to network latency. Programs tied to I / O spend most of their lives waiting, you guessed it, for input or output (just like a wizard waits for his friends to go to places from the scroll and return back). This can be input / output from the network, database, file, or from the user. Such I / O usually takes a lot of time, because the source may need to perform preprocessing before transferring the data to the I / O. For example, a processor will count much faster than a network connection will transmit data (at a speed of approximatelylike Flash vs your grandmother).

Note : multithreadingcan be very useful in tasks such as cleaning web pages.

Chapter 3: Multiprocessing

Years passed, the fame of the good wizard grew, and with it grew the envy of one impartial dark wizard (Sarumort? Or maybe Volandeman?). Armed with immeasurable cunning and driven by envy, the dark wizard cast a terrible curse on Dumbledalf. When the curse overtook him, Dumbledalf realized that he had only a few moments to ward him off. Desperate, he rummaged through his spell books and quickly found one counter-spell that was supposed to work. The only problem was that the wizard needed to calculate the sum of all primes less than 1,000,000. A strange spell, of course, but what we have.

The wizard knew that calculating the value would be trivial if he had enough time, but he didn’t have that luxury. Despite the fact that he is a great wizard, yet he is limited by his humanity and can check for simplicity only one number at a time. If he decided to simply add up the primes one after another, it would take too much time. When there were only seconds left before applying the counterspell, he suddenly remembered the multiprocessing spell , which he had learned from the magic scroll many years ago. This spell will allow him to copy himself in order to distribute numbers between his copies and check several at the same time. And in the end, all he needs to do is simply add up the numbers that he and his copies will discover.

In [1]:
from multiprocessing import Pool
In [2]:
def if_prime(x):
    if x <= 1:
        return 0
    elif x <= 3:
        return x
    elif x % 2 == 0 or x % 3 == 0:
        return 0
    i = 5
    while i**2 <= x:
        if x % i == 0 or x % (i + 2) == 0:
            return 0
        i += 6
    return x
In [17]:
%%time

answer = 0

for i in range(1000000):
    answer += if_prime(i)

CPU times: user 3.48 s, sys: 0 ns, total: 3.48 s
Wall time: 3.48 s
In [18]:
%%time

if __name__ == '__main__':
    with Pool(2) as p:
        answer = sum(p.map(if_prime, list(range(1000000))))

CPU times: user 114 ms, sys: 4.07 ms, total: 118 ms
Wall time: 1.91 s
In [19]:
%%time

if __name__ == '__main__':
    with Pool(4) as p:
        answer = sum(p.map(if_prime, list(range(1000000))))

CPU times: user 99.5 ms, sys: 30.5 ms, total: 130 ms
Wall time: 1.12 s
In [20]:
%%timeit

if __name__ == '__main__':
    with Pool(8) as p:
        answer = sum(p.map(if_prime, list(range(1000000))))

729 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [21]:
%%timeit

if __name__ == '__main__':
    with Pool(16) as p:
        answer = sum(p.map(if_prime, list(range(1000000))))

512 ms ± 39.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]:
%%timeit

if __name__ == '__main__':
    with Pool(32) as p:
        answer = sum(p.map(if_prime, list(range(1000000))))

518 ms ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [23]:
%%timeit

if __name__ == '__main__':
    with Pool(64) as p:
        answer = sum(p.map(if_prime, list(range(1000000))))

621 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [ ]:

Modern processors have more than one core, so we can speed up the tasks using multiprocess processing module multiprocessing . The tasks associated with the processor are programs that most of the time they work perform calculations on the processor (trivial mathematical calculations, image processing, etc.). If the calculations can be performed independently of each other, we have the opportunity to divide them between the available processor cores, thereby obtaining a significant increase in processing speed.

All you have to do is:

Determine which function to apply
Prepare a list of elements to which the function will be applied;
multiprocessing.Pool. , Pool(), . with , .
Pool map. map , , .

Note : A function can be defined to perform any task that can be performed in parallel. For example, a function may contain code to write the result of a calculation to a file.

So why do we need to separatemultiprocessingandmultithreading? If you have ever tried to improve the performance of a task on the processor using multithreading, the result is exactly the opposite. That's just terrible! Let's see how it happened.

Just as a wizard is limited by his human nature and can only calculate one number per unit of time, Python comes with a thing called Global Interpreter Lock (GIL) . Python is happy to let you spawn as many threads as you want, but GILensures that only one of these threads will execute at any given time.

For a task related to I / O, this situation is completely normal. One thread sends a request to one URL and waits for a response, only then this thread can be replaced by another, which will send another request to a different URL. Since the thread does not have to do anything until it receives a response, it makes no difference that at the given time only one thread is running.

For tasks performed on the processor, having multiple threads is almost as useless as nipples in armor. Since only one thread can be executed per unit time, even if you generate several threads, each of which will be allocated a number to check for simplicity, the processor will still work with only one thread. In fact, these numbers will still be checked one by one. And the overhead of working with multiple threads will help reduce performance, which you can just observe when used multithreadingin tasks performed on the processor.

To get around this “restriction”, we use the modulemultiprocessing. Instead of using threads, multiprocessing uses, as you say it ... several processes. Each process gets its own personal interpreter and memory space, so the GIL will not limit you. In fact, each process will use its own processor core and work with its own unique number, and this will be performed simultaneously with the work of other processes. How sweet of them!

You may notice that the CPU load will be higher when you use multiprocessingit compared to a regular for loop or even multithreading. This happens because your program uses not one core, but several. And this is good!

remember, thatmultiprocessingIt has its own overhead of managing multiple processes, which is usually more serious than the cost of multithreading. (Multiprocessing spawns separate interpreters and assigns each memory area to each process, so yes!) That is, as a rule, it is better to use a lightweight version of multithreading when you want to get out in this way (remember about tasks related to I / O). But when the calculation on the processor becomes a bottleneck, the module time comes multiprocessing. But remember that with great power comes great responsibility.

If you spawn more processes than your processor can process per unit of time, you will notice that performance will begin to decline. This happens because the operating system needs to do more work by shuffling processes between the processor cores, because there are more processes. In reality, everything can be even more complicated than I told you today, but I conveyed the main idea. For example, on my system, performance will drop when the number of processes is 16. This is because there are only 16 logical cores in my processor.

Chapter 4: Conclusion

In tasks related to I / O, it multithreadingcan improve performance.
In tasks related to I / O, multiprocessingit can also increase productivity, but the costs, as a rule, are higher than when using multithreading.
The existence of the Python GIL makes it clear to us that at any given time in a program, only one thread can execute.
In tasks related to the processor, use multithreadingcan reduce performance.
In processor related tasks, use multiprocessingcan improve performance.
Wizards are awesome!

That is where we will end our introduction to multithreadingand multiprocessingin Python today . Now go and win!

"Modeling COVID-19 using graph analysis and parsing open data." Free lesson.

Why, when and how to use multithreading and multiprocessing in Python

Chapter 1: Single-Threaded, Single-Process

Chapter 2: Multithreading

Chapter 3: Multiprocessing

Chapter 4: Conclusion

More articles: