Hi, thanks for open-sourcing VectorWorld! I am currently trying to reproduce the Waymo evaluation results, but I noticed that the repository does not seem to contain the official eval pipeline used in the paper.
Right now, my eval code is self-implemented based on the Scenario Dreamer evaluation code and adapted to VectorWorld, so I am not fully confident that my protocol matches the one used in the paper. This makes it difficult to determine whether the metric gap comes from the checkpoint/configuration or from evaluation differences.
Would it be possible to also release the official evaluation code used for the paper?
I believe releasing the eval code would greatly improve reproducibility for the community. It would also help clarify whether the discrepancy comes from the model/checkpoint/configuration or simply from differences in evaluation implementation.
Thanks again for the great work!
Hi, thanks for open-sourcing VectorWorld! I am currently trying to reproduce the Waymo evaluation results, but I noticed that the repository does not seem to contain the official eval pipeline used in the paper.
Right now, my eval code is self-implemented based on the Scenario Dreamer evaluation code and adapted to VectorWorld, so I am not fully confident that my protocol matches the one used in the paper. This makes it difficult to determine whether the metric gap comes from the checkpoint/configuration or from evaluation differences.
Would it be possible to also release the official evaluation code used for the paper?
I believe releasing the eval code would greatly improve reproducibility for the community. It would also help clarify whether the discrepancy comes from the model/checkpoint/configuration or simply from differences in evaluation implementation.
Thanks again for the great work!