fix(trainer): simplify local-exec command building process#249
fix(trainer): simplify local-exec command building process#249khushiiagrawal wants to merge 2 commits intokubeflow:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
1b4bb7c to
2ed9eef
Compare
…and remove unused test imports Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
|
I noticed that the uv.lock file in the main branch seems to be corrupted. When I try to run the tests locally to verify my changes, uv lock --check fails with a syntax error (missing field version at line 921). I haven't included the lockfile regeneration in this commit yet, but as a result, the tests are currently failing locally (and likely in CI if it checks the lockfile). Could you please confirm if I should regenerate and commit the fixed uv.lock in this PR ? This seems necessary to get the tests passing. |
|
@andreyvelich @akshaychitneni Please take a look. |
What this PR does / why we need it:
Different from the previous approach of using complex Bash script templates, this PR simplifies the
local-execcommand building process by introducing a generated Python runner script (runner.py). This improves maintainability, debugging, and cross-platform compatibility.Specific changes include:
LOCAL_EXEC_JOB_TEMPLATE(Bash) withRUNNER_TEMPLATE(Python) inconstants.py.utils.pyto generate therunner.pyscript which handles virtual environment creation, dependency installation, and training command execution using Python'ssubprocess.LocalProcessBackend.__get_job_statusinbackend.pyto properly returnTRAINJOB_COMPLETEinstead of defaulting toTRAINJOB_CREATED, ensuringwait_for_job_statusfunctions correctly.utils_test.pyto verify the command generation logic.Fixes #93
Checklist: