Add MultiGPU support for DPR Training via DDP by tholor · Pull Request #619 · deepset-ai/FARM

tholor · 2020-11-09T10:22:01Z

In order to enable larger batch sizes for DPR training, we need multi GPU support.
Let's use DistributedDataParallel as it's the more performing and scalable option ...

gather tensors for loss with in-batch negatives
verify eval is only running on rank 0
adjust vocab size check for DDP
verify distribution of dataset into batches
infer/pass distributed_world_size in prediction head
fix nonzero() deprecation warning

Future work

refactor all_gather_list to torch's standard all_gather()

…n PH.

Timoeller

Ok lets merge this PR now and improve it later on.

examples/dpr_encoder.py

tholor added 6 commits November 5, 2020 17:36

WIP initial global sync for loss

288650c

rename vars

140fcdb

wip ddp

db4fd9d

fix gathering of tensors for DDP

0e09944

fix vocab_size check. fix example script for DDP. fix check of rank i…

988fb54

…n PH.

fix typo. fix deprecation warning

ffeb8a4

tholor requested a review from Timoeller November 10, 2020 13:03

tholor changed the title ~~WIP Add MultiGPU support for DPR Training via DDP~~ Add MultiGPU support for DPR Training via DDP Nov 10, 2020

Timoeller reviewed Nov 11, 2020

View reviewed changes

examples/dpr_encoder.py Outdated Show resolved Hide resolved

update params in example script

d7e6196

tholor merged commit 2fabc31 into master Nov 12, 2020

tholor mentioned this pull request Nov 12, 2020

Refactor all_gather_list to standard pytorch function (DPR training via DDP) #623

Closed

Timoeller mentioned this pull request Nov 17, 2020

Support MultiGPU Training for DPR #575

Closed

tholor mentioned this pull request Nov 17, 2020

Support MultiGPU Training of DPR via DDP deepset-ai/haystack#597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultiGPU support for DPR Training via DDP#619

Add MultiGPU support for DPR Training via DDP#619
tholor merged 7 commits intomasterfrom
dpr_multigpu_ddp

tholor commented Nov 9, 2020 •

edited

Loading

Uh oh!

Timoeller left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tholor commented Nov 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Timoeller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tholor commented Nov 9, 2020 •

edited

Loading