Skip to content
This repository was archived by the owner on May 1, 2025. It is now read-only.
This repository was archived by the owner on May 1, 2025. It is now read-only.

ValueError: Default process group has not been initialized, please make sure to call init_process_group. #139

@RichardMLuu

Description

@RichardMLuu

您好,我在使用的时候遇到了问题,我发现如果不使用分布式训练就需要修改源代码。直接在命令行中令distributed为False并不能解决该类问题,请问应该如何解决分布式训练带来的问题,只能注释掉相关代码来解决吗。
----translation-----
Hello, I'm having problems with it and I realized that I need to modify the source code if I don't use distributed training. Directly making distributed to False on the command line does not solve this type of problem, how should I solve the problem caused by distributed training, can I only comment out the relevant code to solve it.

----log-----
Traceback (most recent call last):
File "F:\Projects\Multi Modal\ALBEF\Pretrain.py", line 203, in
main(args, config)
File "F:\Projects\Multi Modal\ALBEF\Pretrain.py", line 175, in main
dist.barrier()
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\c10d_logger.py", line 72, in wrapper
return func(*args, **kwargs)
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\distributed_c10d.py", line 3428, in barrier
opts.device = _get_pg_default_device(group)
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\distributed_c10d.py", line 644, in _get_pg_default_device
group = group or _get_default_group()
File "F:\anaconda3\envs\albef\lib\site-packages\torch\distributed\distributed_c10d.py", line 977, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions