Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Add MultiGPU support for DPR Training via DDP#619

Merged
tholor merged 7 commits intomasterfrom
dpr_multigpu_ddp
Nov 12, 2020
Merged

Add MultiGPU support for DPR Training via DDP#619
tholor merged 7 commits intomasterfrom
dpr_multigpu_ddp

Conversation

@tholor
Copy link
Member

@tholor tholor commented Nov 9, 2020

In order to enable larger batch sizes for DPR training, we need multi GPU support.
Let's use DistributedDataParallel as it's the more performing and scalable option ...

  • gather tensors for loss with in-batch negatives
  • verify eval is only running on rank 0
  • adjust vocab size check for DDP
  • verify distribution of dataset into batches
  • infer/pass distributed_world_size in prediction head
  • fix nonzero() deprecation warning

Future work

  • refactor all_gather_list to torch's standard all_gather()

@tholor tholor requested a review from Timoeller November 10, 2020 13:03
@tholor tholor changed the title WIP Add MultiGPU support for DPR Training via DDP Add MultiGPU support for DPR Training via DDP Nov 10, 2020
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lets merge this PR now and improve it later on.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants