Currently, AsyncWorker uses the global pthread pool for scheduling its own tasks, taking threads as they become available. This works fine if AsyncWorker is the exclusive user of multithreading, but not when the addon is linked with other libraries that also rely on threads.
Here's the general problematic scenario:
- addon
A has async_method that spawns tasks onto an AsyncWorker
- it links with library
B and invokes B.some_method() that uses up to N threads itself
- logically, user builds addon with
PTHREAD_POOL_SIZE=N+1 (N threads for the library + 1 for the async op in the addon itself)
Now, if user invokes A.async_method(), the addon A will take one thread from the pool for the worker, N threads are left, the library takes those N threads, finishes the work, everything's fine.
But now another user has code that invokes async_method() on various inputs in parallel e.g. via Promise.all:
Promise.all([input1, input2]).map(input => A.async_method())
Now, the addon A will take 2 threads (one per each async_method() invocation) to create the AsyncWorkers, each of those workers invokes the B.some_method(), each of those parallel B.some_method() invocations tries to allocate N workers == 2*N total, but the pool only has (N+1)-2 == N-1 threads at this point! So each of them deadlocks.
User could increase the PTHREAD_POOL_SIZE to 2*(N+1), but then the same failure will happen if they happen to invoke 3 invocations of the library in parallel, and so on.
I actually observed this behaviour with deadlocks in a real-world addon I'm cross-compiling to Wasm with emnapi right now, and it took some time to figure out what's happening.
In my case I fixed this by monkey-patching the async methods to queue them one by one when compiled to Wasm, but, ideally, this should be handled at the emnapi level.
I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool. Then, it could either use the same JS logic or even some C++ implementation of work-stealing to execute max of EMNAPI_WORKER_POOL_SIZE in parallel, and no more. This would allow users to configure EMNAPI_WORKER_POOL_SIZE to the desired number of tasks, and PTHREAD_POOL_SIZE to EMNAPI_WORKER_POOL_SIZE * (N + 1) in advance, and avoid risk of deadlocks altogether.
And then, those users who know that they're not linking against any multithreaded libraries will be able to set both PTHREAD_POOL_SIZE and EMNAPI_WORKER_POOL to (number of CPU cores) and get max parallelisation as they can today.
Currently, AsyncWorker uses the global pthread pool for scheduling its own tasks, taking threads as they become available. This works fine if AsyncWorker is the exclusive user of multithreading, but not when the addon is linked with other libraries that also rely on threads.
Here's the general problematic scenario:
Ahasasync_methodthat spawns tasks onto anAsyncWorkerBand invokesB.some_method()that uses up toNthreads itselfPTHREAD_POOL_SIZE=N+1(Nthreads for the library + 1 for the async op in the addon itself)Now, if user invokes
A.async_method(), the addonAwill take one thread from the pool for the worker,Nthreads are left, the library takes thoseNthreads, finishes the work, everything's fine.But now another user has code that invokes
async_method()on various inputs in parallel e.g. viaPromise.all:Now, the addon
Awill take 2 threads (one per eachasync_method()invocation) to create theAsyncWorkers, each of those workers invokes theB.some_method(), each of those parallelB.some_method()invocations tries to allocateNworkers ==2*Ntotal, but the pool only has(N+1)-2 == N-1threads at this point! So each of them deadlocks.User could increase the
PTHREAD_POOL_SIZEto2*(N+1), but then the same failure will happen if they happen to invoke 3 invocations of the library in parallel, and so on.I actually observed this behaviour with deadlocks in a real-world addon I'm cross-compiling to Wasm with emnapi right now, and it took some time to figure out what's happening.
In my case I fixed this by monkey-patching the async methods to queue them one by one when compiled to Wasm, but, ideally, this should be handled at the emnapi level.
I think the only reliable solution here is for emnapi to have a separate, exclusive, thread pool, instead of taking as many threads as it wants from the global Emscripten pool. Then, it could either use the same JS logic or even some C++ implementation of work-stealing to execute max of
EMNAPI_WORKER_POOL_SIZEin parallel, and no more. This would allow users to configureEMNAPI_WORKER_POOL_SIZEto the desired number of tasks, andPTHREAD_POOL_SIZEtoEMNAPI_WORKER_POOL_SIZE * (N + 1)in advance, and avoid risk of deadlocks altogether.And then, those users who know that they're not linking against any multithreaded libraries will be able to set both
PTHREAD_POOL_SIZEandEMNAPI_WORKER_POOLto(number of CPU cores)and get max parallelisation as they can today.