Going Parallel with concurrent.futures¶

Parallel computation can be implemented via the concurrent.futures module. The module is part of the standard library since Python 3.2.

Let's look at a simple example.

In [3]:

from time import sleep

def slow_function(x):
    print('start: {}'.format(x))
    sleep(5)
    print('done: {}'.format(x))
    return x**2

We defined slow_function, a function simulating a heavy computation (taking 5 seconds). We now run it in the most straightforward way.

In [8]:

numbers = range(4)

for n in numbers:
    slow_function(n)

start: 0
done: 0
start: 1
done: 1
start: 2
done: 2
start: 3
done: 3

So, the functions are done one at a time as expected. Let's measure how long this take.

In [9]:

import time

numbers = range(4)

start_time = time.time()

for n in numbers:
    slow_function(n)

print(time.time() - start_time, 'seconds')

start: 0
done: 0
start: 1
done: 1
start: 2
done: 2
start: 3
done: 3
20.022506713867188 seconds

It takes 20 seconds to run 4 functions. The reason is they are all run on one single core. Now, we refactor the execution, and we make it parallel.

In [11]:

from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor() as executor:
    for n in numbers:
        f = executor.submit(slow_function, n)

start: 0
start: 1
done: 1
done: 0
start: 3
start: 2
done: 2
done: 3

I am running these lines on a dual-core laptop. You can see that, indeed, the two cores are used simultaneously. The first two iterations are launched at the same time. Now, how much time do you expect we saved?

In [12]:

start_time = time.time()

with ProcessPoolExecutor() as executor:
    for n in numbers:
        f = executor.submit(slow_function, n)

print(time.time() - start_time, 'seconds')

start: 0
start: 1
done: 0
start: 2
done: 1
start: 3
done: 2
done: 3
10.159396648406982 seconds

Nice, the run-time was halved. It makes sense as we splitted the computation across two cores.