Skip to content

ci: Add support for linux-riscv64#7414

Open
luhenry wants to merge 2 commits into
k0sproject:mainfrom
luhenry:main
Open

ci: Add support for linux-riscv64#7414
luhenry wants to merge 2 commits into
k0sproject:mainfrom
luhenry:main

Conversation

@luhenry
Copy link
Copy Markdown
Contributor

@luhenry luhenry commented Apr 9, 2026

Description

This is currently an experiment. Happy to get reviews as always.

Add GitHub Actions CI on linux-riscv64 using RISE RISC-V Runners.

Relates to #1919
Depends on k0sproject/image-builder#253 #7459

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 9, 2026

Currently blocked on missing https://github.com/anchore/syft support for linux-riscv64. I'll submit a PR upstream and update here accordingly. EDIT: work in progress at anchore/syft#4757, I disabled it here for now for riscv64 specifically.

@twz123
Copy link
Copy Markdown
Member

twz123 commented Apr 10, 2026

Nice! IIUC we need to add the RISE app to our GitHub org, right? There's also https://www.riscvrunners.com/. Do you know the differences?

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Nice! IIUC we need to add the RISE app to our GitHub org, right? There's also https://www.riscvrunners.com/. Do you know the differences?

Yes, you’d need to add https://github.com/apps/rise-risc-v-runners (only require permissions on “self-hosted runners read+write”).

On https://www.riscvrunners.com/, the main difference is the RISE ones are free (and will stay free, RISE is not a business) and you have access to the whole machine. I’m not intimately familiar with the base image they are using (which packages, what OS, etc) but I’ll look more into it.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Trying to add the proper image dependencies at k0sproject/image-builder@main...luhenry:k0s-image-builder:main

@twz123
Copy link
Copy Markdown
Member

twz123 commented Apr 10, 2026

Trying to add the proper image dependencies at k0sproject/image-builder@main...luhenry:k0s-image-builder:main

I need to follow up on k0sproject/image-builder#256. I don't think we can generally enable RISC-V builds for everything.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Trying to add the proper image dependencies at k0sproject/image-builder@main...luhenry:k0s-image-builder:main

I need to follow up on k0sproject/image-builder#256. I don't think we can generally enable RISC-V builds for everything.

Sounds good. Do you know if there will be other images that will be a required dependency?

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

I'm brute-forcing my way through figuring out which images/versions are working: https://github.com/luhenry/k0s-image-builder/actions/runs/24236376271

Good news is quite a few are already working, so adding them is a simple "adding a new version" away. Once that workflow finished, I'll look through all the failures and categorize them, that should give us a good todo list, and list of things to exclude/ignore as well

@twz123
Copy link
Copy Markdown
Member

twz123 commented Apr 10, 2026

Sounds good. Do you know if there will be other images that will be a required dependency?

No. In fact, Calico is not a requirement. By default, k0s will use kube-router, which is already available for RISC-V. You won't be able to use NLLB (another optional feature), as this is would require envoy, and we didn't have the time to try to make it compile for RISC-V. But I'm currently looking into adding support for Traefik (#7405), mainly for Windows, but Traefik supports RISC-V and ARMv7 out of the box, so this might be a viable alternative to make NLLB usable on RISC-V, as well.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Recent development: BIRD has just added (like 3 days ago) builds for riscv64: https://gitlab.nic.cz/labs/bird/-/commit/0aca5b325fb6606d620ef349d70892105e1242c3. I ran into that issue in the build for calico-node at least.

In fact, Calico is not a requirement

Good, then at least it's not a blocker. It should get resolved soon though. I never like diverging from how it's done on linux-amd64 and linux-arm64 because I like things to "just work". But gotta be pragmatic here 😅

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Also, another finding: etcd is broken on all platforms, since bitnami/etcd seems to have no more tags (?). That seems broken not just for linux-riscv64 but for all other platforms as well.

@twz123
Copy link
Copy Markdown
Member

twz123 commented Apr 10, 2026

Also, another finding: etcd is broken on all platforms, since bitnami/etcd seems to have no more tags (?). That seems broken not just for linux-riscv64 but for all other platforms as well.

Not all the images in the image-builder repo are used by k0s itself. There's also some utility things, like the GH Actions runners which we use to build k0s for ARMv7. Not sure if the etcd stuff is used anywhere, still.

Pretty sure that envoy is the last image that's directly used by k0s which is not available for RISC-V.

@twz123
Copy link
Copy Markdown
Member

twz123 commented Apr 10, 2026

Recent development: BIRD has just added (like 3 days ago) builds for riscv64: https://gitlab.nic.cz/labs/bird/-/commit/0aca5b325fb6606d620ef349d70892105e1242c3. I ran into that issue in the build for calico-node at least.

We started to build BIRD from source in the Calico Dockerfiles recently, so this should be fine.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Triggering with v3.31.4-2 from k0sproject/image-builder#256, running at https://github.com/luhenry/k0s/actions/runs/24244825789

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

And for the missing quay.io/k0sproject/pushgateway-ttl:1.4.0-k0s.0 image: k0sproject/pushgateway-ttl-builder#5

Comment thread pkg/constant/constant.go Outdated
@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Following k0sproject/pushgateway-ttl-builder#5 (comment), I've cherry-picked your commit to mark Envoy as not-supported.

luhenry added a commit to luhenry/troubleshoot that referenced this pull request Apr 10, 2026
This project is a dependency of `k0s` which is working on being built, tested and released on `linux-riscv64` [1].

RISC-V is gaining momentum especially in the embedded and edge world. 

[1] k0sproject/k0s#7414
@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

https://github.com/replicatedhq/troubleshoot is missing on riscv64 now for:

The simplest solution is to simply not run the smoketests-linux-riscv64 for now. I'll submit a PR upstream to try to get them to enable and release linux-riscv64 as well.

I've also verified that k0sproject/image-builder#253 is working: I rebased it and used the resulting image at https://github.com/luhenry/k0s/actions/runs/24254357499/job/70824468191. The Build :: Airgap image bundle (riscv64) / linux-riscv64 is passing.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 10, 2026

Opened a PR on replicatedhq/troubleshoot#2010 for linux-riscv64

@luhenry luhenry changed the title [Experiment] Add CI on RISC-V ci: Add support for linux-riscv64 Apr 10, 2026
@luhenry luhenry marked this pull request as ready for review April 10, 2026 21:14
@luhenry luhenry requested review from a team as code owners April 10, 2026 21:14
@twz123
Copy link
Copy Markdown
Member

twz123 commented Apr 17, 2026

A full CI workflow has passed for the latest commit: https://github.com/luhenry/k0s/actions/runs/24552328790

... hot stuff 🔥

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 21, 2026

@twz123 from talking with @caniszczyk yesterday, we should be good to enable the RISE RISC-V Runners on k0sproject/k0s. For next steps, we can either 1. enable it right away if you want and we iterate on the different needs that you have, or 2. figure out your requirements ahead of time and I can implement them in next days/weeks. Let me know what you would prefer.

@luhenry luhenry force-pushed the main branch 2 times, most recently from ec0c73f to 9e9947b Compare April 21, 2026 13:31
@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 28, 2026

anchore/syft has integrated support for linux/riscv64, so it should show up on their next release. I'll update this PR when available.

@twz123 anything else that's blocking this PR from being merged?

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 28, 2026

I'll explore enabling the tests at luhenry#19

@jnummelin
Copy link
Copy Markdown
Member

anything else that's blocking this PR from being merged?

I think the main blocker currently is the official agreement from CNCF side which they are looking into

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Apr 29, 2026

From debugging the smoketests on the machines directly (added a || sleep 30d after make -C inttest), the issue has to do with the current setup:

  • each workflow runs inside a k8s pod, so inside a container with its own overlayfs
  • the dockerd is launched as a background process inside that pod (it's not shared with the host as that would lead to leaking information and data across pods and workflows)
  • the docker container for worker0, worker1, and controller0 are then docker-in-container, and so overlayfs on top of overlayfs.
  • (that's where it gets a bit confusing to me as I'm not entirely aware of how k0s and overlayfs works)
  • something somewhere is trying to allocate an overlayfs filesystem on top of overlayfs, using the "host" (aka overlayfs) filesystem as an "upper" <- that's not possible

The error I get in dmesg is [Tue Apr 28 23:22:42 2026] overlayfs: filesystem on '/var/lib/k0s/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/13/fs' not supported as upperdir.

Some workarounds that Claude proposed me are:

  • Maybe a newer version of the Kernel supports something that would allow us to circumvent that issue. Currently the underlying machines are stuck on kernel 5.9, RISE is working with a kernel contributor to upstream the necessary patches for the TH1520 so that we can go to mainline kernel. The timeline is multiple weeks at minimum
  • Allocating a block device backed by a file and formatted as ext4 to mount /var/lib/k0s` onto it, and do that as part of the pod startup process. It's possible to do in RISE infra or to do as part of the k0s workflow.
  • Modify something somewhere in k0s CI infra to not mount it as "upperdir", but I don't understand what that means.

The longer term fix will be to switch from pods to VMs, that's going to happen in a couple of months when we have RVA23 hardware available and hooked in a datacenter that we can use. I would make sure to prioritize k0s to use these machines.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 4, 2026

I got some progress in enabling the test suite but hitting the next blocker.

On the overlay-on-overlay, I fixed it by using an empty volume for the pod at /var/lib/k0s (see [1]). Because a k8s pod's volumne is a direct mount from the host's filesystem into the container, and not an overlayfs, the /var/lib/k0s folder inside the pod is a ext4 mount [2].

The next issue I ran into was the DNS from kube-proxy inside the k0s cluster was getting confused because the CIDR was the same between the k8s's pod and the k0s running inside the pod. The solution was simply to use the hostNetwork for the pods [3]. This actually makes more sense since the pod should not have access to anything internal to the k8s cluster.

Finally, the issue I'm not running into has to do with kube-router trying to setup a firewall which depends on the ip_set_hash_net kernel module. However, that module is not available on the Scaleway EM-RV1 out-of-the-box (even as a module). So the solution would be to build the kernel module so that it can be loaded. I'm currently working on that.

As a side note, we are also working with a Kernel contributor to upstream any patches necessary to mainline Linux to support the TH1520 (the SoC used by the EM-RV1) so we can go to mainline kernel instead of an older kernel version.

Overall, it's making good progress and I'm hoping to get things working in next weeks. Good news is nearly nothing has to change on k0s itself, and all in the infrastructure.

[2]:

$> kubectl exec -it rise-riscv-runner-9572x75pv -- findmnt -R
TARGET SOURCE FSTYPE  OPTIONS
/      overlay
              overlay rw,relatime,lowerdir=/var/lib/containerd/io.containerd.sna
|-/var/lib/k0s
|      /dev/mmcblk0p3[/var/lib/kubelet/pods/a8bffb66-a287-4d9f-a4bf-7cb8c30db806/volumes/kubernetes.io~empty-dir/k0s]
|             ext4    rw,relatime,errors=remount-ro

@twz123
Copy link
Copy Markdown
Member

twz123 commented May 5, 2026

Good news! CNCF approved to add the RISE RISC-V runners to the k0sproject org 🚀

@luhenry could you rework this PR in a way that it doesn't trigger for every PR? Instead, I think it's best to start with a nightly job that also has a workflow_dispatch, similar to what we have for the ostests.

If the integration tests aren't working yet, we could add this as an optional parameter for workflow_dispatch (again, in a way similar to what we're doing for the ostests, this time in ostests-matrix.yaml). This way, we can execute and iterate on the integration tests without needing to change the workflows.

@luhenry luhenry force-pushed the main branch 2 times, most recently from fa4d755 to cc9320d Compare May 6, 2026 14:04
@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 6, 2026

I split the unittests in a separate workflow_call file, similar to build-k0s and smoketests. I added a go-nightly.yml which only runs the build-k0s, unittests, build-airgap-image-bundle and smoketests-linux-riscv64 (disabled for now [1][2]) for linux-riscv64.

I'm testing it at:

@luhenry luhenry force-pushed the main branch 2 times, most recently from d476e15 to 822738f Compare May 6, 2026 16:43
@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 6, 2026

For the latest on the issue with kube-router, I met with my contact at Scaleway today, and they are going to share the sources for the Linux Kernel they have on the Scaleway EM-RV1. I'll build the necessary modules from it and add them to all the machines of the pool. I'll keep you posted as soon as I have more things to show.

@luhenry could you rework this PR in a way that it doesn't trigger for every PR? Instead, I think it's best to start with a nightly job that also has a workflow_dispatch, similar to what we have for the ostests.

If the integration tests aren't working yet, we could add this as an optional parameter for workflow_dispatch (again, in a way similar to what we're doing for the ostests, this time in ostests-matrix.yaml). This way, we can execute and iterate on the integration tests without needing to change the workflows.

@twz123 that should be all done.

Comment thread .github/workflows/build-k0s.yml
Comment thread .github/workflows/go-nightly.yml Outdated
Comment thread .github/workflows/go-nightly.yml Outdated
Comment thread .github/workflows/unittests-k0s.yml Outdated
Comment thread .github/workflows/unittests-k0s.yml Outdated
Comment thread .github/workflows/go-nightly.yml Outdated
Comment thread .github/workflows/go-nightly.yml Outdated
Comment thread .github/workflows/go-nightly.yml Outdated
@twz123
Copy link
Copy Markdown
Member

twz123 commented May 7, 2026

If you want, you can split out 6fd5939 into a separate PR, so we can merge that already.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 7, 2026

If you want, you can split out 6fd5939 into a separate PR, so we can merge that already.

#7591

I'll work on the rest of the feedback tomorrow

@twz123
Copy link
Copy Markdown
Member

twz123 commented May 7, 2026

If you want, you can split out 6fd5939 into a separate PR, so we can merge that already.

#7591

I'll work on the rest of the feedback tomorrow

I meant the split-off of the unit tests into their own callable workflow 😅

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 7, 2026

If you want, you can split out 6fd5939 into a separate PR, so we can merge that already.

#7591
I'll work on the rest of the feedback tomorrow

I meant the split-off of the unit tests into their own callable workflow 😅

Ah sorry, my bad, will do that first thing tomorrow!

luhenry added 2 commits May 9, 2026 22:40
Signed-off-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Ludovic Henry <git@ludovic.dev>
@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 10, 2026

@twz123 I split it out as you requested.

Also, I got check-basic to work 🎉 I updated the underlying workers with the right kernel modules and it works. I'll check other test suites now.

UPDATE: check-basic 1 and check-airgap 2 are working. The check-network-conformance-calico 3 and check-network-conformance-kuberouter 4 are not, but only because https://hub.docker.com/r/sonobuoy/sonobuoy is not available on riscv64. I'll try submitting a PR this week

UPDATE 2: The PR for sonobuoy: vmware-tanzu/sonobuoy#2049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants