Hi,
I created a small program and did a profile and the hot path was NdArray::dot(), which uses std::inner_product() internally.
Reading https://en.cppreference.com/w/cpp/algorithm/inner_product.html I noticed that the parallelizable version is std::transform_reduce(), https://en.cppreference.com/w/cpp/algorithm/transform_reduce.html .
Have you considered using std::transform_reduce()?
Thanks,
Bruno
Hi,
I created a small program and did a profile and the hot path was
NdArray::dot(), which usesstd::inner_product()internally.Reading https://en.cppreference.com/w/cpp/algorithm/inner_product.html I noticed that the parallelizable version is
std::transform_reduce(), https://en.cppreference.com/w/cpp/algorithm/transform_reduce.html .Have you considered using
std::transform_reduce()?Thanks,
Bruno