In the paper, it is mentioned that "we also reset the optimizer state when training stages switch."
However, in
|
if configs.reset_optimizer: |
|
del optimizer |
|
|
|
optimizer = optim.AdamW( |
|
parallel_model.parameters(), |
|
lr=configs.lr, |
|
weight_decay=configs.weight_decay, |
|
) |
|
|
the optimizer is reset every epoch.