Skip to content

Conversation

@supertetelman
Copy link
Contributor

We now have the following configurations to test.

This change covers these all in the nightlies and it will probably take around 4 hours end to end.

I tried to mix-and-match the actual tests being run across them so that we have some confidence without filling out a dense test matrics. Kubeflow is tested once on GPU Operator and once on device plugin.

I skipped the local-registry tests on containerd installs, but kept it in for docker installs. This may or may not work given the recent changes.

I made sure that the monitoring stack is tested with at least one configuration of device plugin, docker, containerd, and driver-container configurations.

timeout 180 bash -x ./workloads/jenkins/scripts/test-dashboard.sh
'''

echo "Start new virtual environment pre-Slurm checks"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explicitly tear down the VMs before we do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tearing them down is part of this script.

Copy link
Contributor

@ajdecon ajdecon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Confirmed that tests passed in nightly builds.

@ajdecon ajdecon merged commit 704a097 into NVIDIA:master Mar 30, 2022
@ajdecon ajdecon mentioned this pull request Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants