-
Notifications
You must be signed in to change notification settings - Fork 350
Fix some ordering in k8s-cluster.yml to install Helm properly and run all commands from kube-master[0]- fixing CentOS install #1128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| - include: ../bootstrap/bootstrap-openshift.yml | ||
|
|
||
| # GPU operator | ||
| - hosts: kube-master[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation is that Helm commands are run from the provisioning node. No need to install Helm and run it on the management systems.
ajdecon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two issues to address:
ansible-lintfailed with a minor spacing issue:
Linting ./nfs-client-provisioner
WARNING Listing 1 violation(s) that are fatal
tasks/main.yml:4: [var-spacing] [LOW] Variables should have spaces before and after: "{{k8s_nfs_client_repo_name}}"
Warning: var-spacing Variables should have spaces before and after: "{{k8s_nfs_client_repo_name}}"
You can skip specific rules or tags by adding them to your configuration file:
# .ansible-lint
warn_list: # or 'skip_list' to silence them completely
- var-spacing # Variables should have spaces before and after: {{ var_name }}
Finished with 1 failure(s), 0 warning(s) on 2 files.
- The Jenkins end-to-end test failed. This might be a transient failure, so it's worth re-running, then debugging if it repeats.
TASK [install nfs-client-provisioner] ******************************************
fatal: [localhost]: FAILED! => changed=false
cmd:
- /usr/local/bin/helm
- upgrade
- --install
- nfs-subdir-external-provisioner
- nfs-subdir-external-provisioner/nfs-subdir-external-provisioner
- --create-namespace
- --namespace
- deepops-nfs-client-provisioner
- --version
- 4.0.13
- --set
- nfs.server=127.0.0.1
- --set
- nfs.path=/export/deepops_nfs
- --set
- storageClass.defaultClass=true
- --wait
delta: '0:00:00.060010'
end: '2022-03-23 03:29:00.175032'
msg: non-zero return code
rc: 1
start: '2022-03-23 03:29:00.115022'
stderr: |-
Error: Kubernetes cluster unreachable: <html><head><meta http-equiv='refresh' content='1;url=/login?from=%2Fversion%3Ftimeout%3D32s'/><script>window.location.replace('/login?from=%2Fversion%3Ftimeout%3D32s');</script></head><body style='background-color:white; color:white;'>
Authentication required
<!--
You are authenticated as: anonymous
Groups that you are in:
Permission you need to have (but didn't): hudson.model.Hudson.Read
... which is implied by: hudson.security.Permission.GenericRead
... which is implied by: hudson.model.Hudson.Administer
-->
</body></html>
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
Our Helm installs were doing a mix of running from localhost and/or kube-master[0]. This was causing issues in the nfs-client-provisioner because the CentOS kubespray installer was not properly installing kubectl on the kube-master nodes.
For now I am aligning everything to what we did in GPU Operator. In the future, it would make sense to use the now functional helm Ansible module and run things from localhost (the provisioning node) instead of the kube-master[0]. This would simply allow us to install less binaries on the management nodes, but beyond that it is not a necessary change.
Also added the standard proxy commands to a few places where they were missing in helm installs.
Additionally I moved the block of code that runs helm/kubectl commands to be after the block where we actually install the proper kubectl/helm binaries. This was causing issues in the edge-cases on CentOS because of how different software was installed across Ubuntu/CentOS.
The automated testing already tests all the paths that this touches.