At the moment we have as default multiprocessing=False, but I wonder what was/is the reasoning behind it.
When browsing the web, I can find the following statement:
- multi-threading is good for IO-bound processes like reading or downloading files
- multi-processing is good for computational heavy tasks
When doing a simple test:
import audb
import audinterface
import audmath
import time
def process_func(signal, sampling_rate):
return audmath.db(audmath.rms(signal))
db = audb.load("emodb", version="1.4.1")
for multiprocessing in [False, True]:
for num_workers in [1, 5]:
interface = audinterface.Feature(
["rms"],
process_func=process_func,
num_workers=num_workers,
multiprocessing=multiprocessing,
)
t0 = time.time()
df = interface.process_index(db.files)
t = time.time() - t0
print(f"{multiprocessing=}, {num_workers=}: {t:.2f} s")
it returns (after running the second time)
multiprocessing=False, num_workers=1: 0.16 s
multiprocessing=False, num_workers=5: 0.26 s
multiprocessing=True, num_workers=1: 0.16 s
multiprocessing=True, num_workers=5: 0.11 s
Even though we don't do heavy processing here, multi-processing seems to be faster in this case. Is this expected?
/cc @ureichel, @ChristianGeng, @frankenjoe, @maxschmitt, @audeerington, @schruefer
At the moment we have as default
multiprocessing=False, but I wonder what was/is the reasoning behind it.When browsing the web, I can find the following statement:
When doing a simple test:
it returns (after running the second time)
Even though we don't do heavy processing here, multi-processing seems to be faster in this case. Is this expected?
/cc @ureichel, @ChristianGeng, @frankenjoe, @maxschmitt, @audeerington, @schruefer