Skip to content

Multithreading vs. multiprocessing #171

@hagenw

Description

@hagenw

At the moment we have as default multiprocessing=False, but I wonder what was/is the reasoning behind it.

When browsing the web, I can find the following statement:

  • multi-threading is good for IO-bound processes like reading or downloading files
  • multi-processing is good for computational heavy tasks

When doing a simple test:

import audb
import audinterface
import audmath
import time

def process_func(signal, sampling_rate):
    return audmath.db(audmath.rms(signal))

db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
    for num_workers in [1, 5]:
        interface = audinterface.Feature(
            ["rms"],
            process_func=process_func,
            num_workers=num_workers,
            multiprocessing=multiprocessing,
        )
        t0 = time.time()
        df = interface.process_index(db.files)
        t = time.time() - t0
        print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s")

it returns (after running the second time)

multiprocessing=False, num_workers=1: 0.16 s                                                        
multiprocessing=False, num_workers=5: 0.26 s
multiprocessing=True, num_workers=1: 0.16 s
multiprocessing=True, num_workers=5: 0.11 s

Even though we don't do heavy processing here, multi-processing seems to be faster in this case. Is this expected?

/cc @ureichel, @ChristianGeng, @frankenjoe, @maxschmitt, @audeerington, @schruefer

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions