Add Repository information to Runner Status#2093
Conversation
Add repository information to the runner status. With this information we can clear understand which repositories are using the runner
71d9098 to
eaaf315
Compare
|
@Moser-ss Thanks for your PR! Let me rebase and try testing this locally to move this forward.
This is indeed important! We currently need to correlate runner logs with workflow job logs to see which runner failed due to which job or which job failed on which runner, and correlating two unstructured log streams with the fact that two log streams contains the same job ID isn't that easy. With this feature, we can just watch runner statuses instead, which would greatly eases troubleshooting.. Did I get it right? |
| - token | ||
| type: object | ||
| workflow: | ||
| description: WorkflowStatus contains various information that is propagated from GitHub Actions workflow run environment variables to ease monitoring workflow run/job/steps that are triggerred on the runner. |
There was a problem hiding this comment.
FYI, make manifests (or controller-tools controller-gen used internally by the make target) automatically reflects the Go field comment to this description field.
| #!/usr/bin/env bash | ||
| set -u | ||
|
|
||
| exec update-status Running "Run $GITHUB_RUN_ID from $GITHUB_REPOSITORY" "$GITHUB_REPOSITORY" "$GITHUB_REPOSITORY_OWNER" "$GITHUB_WORKFLOW" |
There was a problem hiding this comment.
I modified your work to not take those as function arguments, and instead refer to envvars presuming they are all optional, so that we won't bloat function parameters and arguments as we add more.
|
@Moser-ss I've updated your awesome work with improvements and suggestions. I'd appreciate it if you could confirm all! Thanks in advance for your help. |
|
@mumoshu Thanks a lot for the rework; after I saw the fine-tuning you did, this PR looks even better. I like all the changes |
It is not only about the logs, but is about also about resources, especially when I have OOM or pod evictions. Now when I got an OOM I can go to GitHub find the right workflow run and repository and understand why we have that OOM |
|
@Moser-ss Makes sense a lot! Thank you so much for your explanation |
|
@mumoshu Any idea when this PR can be merged? do we have any blocker? |
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
This is an extension of the feature introduced in the PR #1268 to add
.status.repository.status.owner.status.repo.status.workflowto enrich the data related to Runner.The idea is to add more context information to the Runner status. In this way, we can correlate the metrics of the runners with the usage done by the workflows. For example, if a runner is evicted or gets OOM, we can know which repository and workflow caused that issue.
Another use case is to understand if some workflows suffer resource constraints in terms of CPU, like workflows that run tests or build assets. With the new status properties, we can correlate high CPU usage with specific repositories and workflow.
An additional use case is to fine-tune runners for specific workflows. By knowing which workflows are executed in the Runner with time, we can understand resource usage patterns and create new runners for specific workflows