Whisker UI new features#11310
Merged
ronanc-tigera merged 30 commits intoprojectcalico:masterfrom Mar 19, 2026
Merged
Conversation
… feature/TSLA-9971-policy-filter-ui-only
… feature/TSLA-9971-policy-filter-ui-only
…ter-ui-only [TSLA-9971] policy filter UI only
… whisker-ui-new-features
… whisker-ui-new-features
… whisker-ui-new-features
…-with-master Whisker UI new features sync with master
…igera/calico into feature/TSLA-9973-reporter-column
…olumn [TSLA-9973] reporter column and filter
* Whisker - Start time filter * update start time logic
* removed old policy filter * fix filter transform issue * add test coverage
* start time filter improvements * improve accessibility for start time select
Contributor
|
This PR is stale because it has been open for 60 days with no activity. |
* Fix rendering of NatPortRange in nftables mode
* Stop consuming redundant HostMetadata message in no-encap manager (#11737)
* Add UTs with fully random
* Add FV
* Fix CI against OpenStack Yoga, by removing it
Yoga has been "unmaintained" - which is OpenStack terminology for a state similar to EoL - since
October 2024, and is no longer of interest to our OpenStack customers. The CI against Yoga recently
broke when we updated our Semaphore platform from Ubuntu 20.04 to 22.04. This was briefly addressed
by https://github.com/projectcalico/calico/commit/29d69fa9324e85f0270e0dc02358e9d451785ec4, but
since then there has been further breakage, which does not look easy to fix - fundamentally because
upstream Yoga-level code was never developed and tested against Ubuntu 22.04.
* For VM-based tests on Jammy pin docker-buildx-plugin (#11743)
* For VM-based tests on Jammy pin docker-buildx-plugin
We need to pin because download.docker.com now has a newer buildx that tries to use an API version
that is too new for the Docker daemon, causing this error:
```
docker buildx build --load --platform=linux/amd64 --pull --build-arg UBI_IMAGE=registry.access.redhat.com/ubi9/ubi-minimal:latest --build-arg GIT_VERSION=v3.32.0-0.dev-643-g38568836d2ac --build-arg CALICO_BASE=calico/base:ubi9-1769122535 --build-arg BPFTOOL_IMAGE=calico/bpftool:v7.5.0 --network=host --build-arg BIN_DIR=dist/bin --build-arg BIRD_IMAGE=calico/bird:v0.3.3-211-g9111ec3c-amd64 --build-arg GIT_VERSION=v3.32.0-0.dev-643-g38568836d2ac -t node:latest-amd64 -f ./Dockerfile.amd64 .
ERROR: failed to build: Error response from daemon: client version 1.52 is too new. Maximum supported API version is 1.41: driver not connecting
make[1]: Leaving directory '/home/ubuntu/calico/node'
make[1]: *** [Makefile:268: .calico_node.created-amd64] Error 1
make: Leaving directory '/home/ubuntu/calico/node'
make: *** [Makefile:440: k8s-test] Error 2
```
* Spurious change to trigger node CI
* Revert "Spurious change to trigger node CI"
This reverts commit 46fdd376d925289648e63c57c03904c74c9a52fa.
Seems we didn't need this to trigger the CI.
* Remove CRDs from tigera-operator helm chart (#11727)
* Add traffic distribution support and enable topology-aware routing for Services
* Fix golangci-lint QF1001
* Use gcloud credential helper to login to GCR (#11752)
* Ability to use projectcalico.org/v3 custom resource definitions (#10447)
* [windows] ASO: add support for nftables and BPF dataplanes (on linux nodes)
Add support for the nftables and BPF dataplanes on linux nodes
to the ASO test infra.
Remove docker installation as only containerd
is necessary.
Use a config yaml for kubeadm init instead of CLI flags.
* fix kubeadm config yaml
* replace docker commands with ctr in windows cni-plugin FVs
* Fix chart target (#11761)
* [BPF] Maglev Prometheus Metrics: Connection counts (#11660)
* Export maglev conntracks as prometheus metrics
Co-authored-by: Shaun Crampton <shaun@tigera.io>
* Update tests to use ubuntu 25.10 instead of 25.04 (#11763)
Co-authored-by: Casey Davenport <davenport.cas@gmail.com>
* Initial plan
* Convert Python 2 code to Python 3 in node/tests/k8st
Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>
* Update test container to Docker 25 and Python 3
Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>
* Fix generated files. (#11766)
* Add --break-system-packages
* Unpin
* Repin to current versions
* Migrate from nose to pytest test runner
Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>
* Fix typo in CNP CRD. (#11768)
* Fix typo in CNP CRD.
* Pin upstream CNP CRD to explicit commit, was floating.
* Python test code fixes
- Correct import path for `utils.utils`
- Output from subprocesses needs `.decode()`
- Avoid pytest running _TestLocalBGPPeer in its own right.
* Update node/tests/k8st/tests/test_bgp_filter.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update node/tests/k8st/tests/test_bgp_filter.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update node/tests/k8st/tests/test_bgp_filter.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Remove unused cluster_route_regex_v4 variable
Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>
* Python test code fixes
- Correct import path for `utils.utils`
- Output from subprocesses needs `.decode()`
- Avoid pytest running _TestLocalBGPPeer in its own right.
* Update to Go 1.25.7
* fix(windows): rename ASO env vars
* Bump CALICO_BASE_VER to ubi9-1770247388
* Run some tests against projectcalico.org/v3 API group (#11758)
* Add dependabot config to update golang.org/x/* libraries (#11776)
* Add dependabot config to update golang.org/x/* libraries
* Add WaitForCloseWithDeadline utility to wait for a channel to close
* Remove profile CRD, as it is unused (#11792)
* Run ci target instead of fv directly (#11793)
* Turn off dependabot
* fix: return images marked as release if not the same as BUILD_IMAGES (#11760)
* Fix app-policy UTs not running (#11795)
* Fix app-policy UTs not running
* Update app-policy/Makefile
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [BPF] Fix propafation of ctx->fwd
Since ctx->fwd is not in state, it is only valid within a single
program. We do set it to true early on, but that does not survive and we
thus may not redirect when we could redirect.
* [BPF] fix unhandled return value from bpf_fib_lookup
* Allow link-local even if HEP rpf check returns BPF_FIB_LKUP_RET_NOT_FWDED (#11781)
* Systemd-resolvd restrart in a node with bpf dataplane results in host networking going down.
When we restart systemd-resolvd, it resets the interface config, routes and the interface loses IP.
systemd-resolvd then uses DHCP to get an ip address. DHCP request is broadcasted and the DHCP offer
comes from 169.254.169.254 which is a link-local address. This will be dropped at the ingress of the
host interface as rpf_check for 169.254.169.254 fails. As a result, the system never comes up.
The DHCP offer is allowed at times, if there was a DHCP renewal just few seconds before systemd restart.
This results in a conntrack and the DHCP offer gets allowed. Without conntrack, it gets dropped.
However if we try get the ip using dhclient, it works even in the broken state. systemd uses UDP socket to
get the DHCP offer and it doesn't get it as we drop because of rpf. dhclient uses a AF_PACKET socket to
read the DHCP packets and it gets it even if tc program drops it. If I attach a xdp program to drop the dhcp
packets, even dhclient cannot read it. So even if a packet is dropped by tc program, an application using
AF_PACKET can snoop the packet.
Fix - At hep_rpf_check, we use fib_lookup to check if the route exists. In this case fib_lookup returns
BPF_FIB_LKUP_RET_NOT_FWDED and we drop the packet. The fix is to allow the packet even if fib_lookup returns
BPF_FIB_LKUP_RET_NOT_FWDED but the source ip is a link local IP.
* Address review comments
* Automatic Pin Updates
* Use same calico/test image for calicoctl ST as for node
This also means converting calicoctl ST from Python 2 to Python 3, and using pytest instead of
nosetests
This work was cherry-picked from the Copilot PR at
https://github.com/projectcalico/calico/pull/11782:
- Update Makefiles to invoke pytest instead of nosetests
- Convert Python 2 syntax to Python 3:
- print statements to print() functions
- dict.iteritems() to dict.items()
- xrange() to range()
- Remove cmp() usage (use equality checks)
- Fix metaclass syntax
- Fix bytes/string handling for hashlib
- Replace nose imports with pytest equivalents
- Replace @attr decorator with pytest.mark
Then I added the following tweaks and fixes:
- Update Makefile note about how to avoid running slow tests.
- Fix relative imports.
- Remove termios stuff.
This dates back to commit de8356294fa295e1fcc963c5e88d7fce4d14749f, 2016, and looks bogus to me.
None of the commands we currently run look like they should be "messing with terminal settings",
so let's remove this and see if anything else breaks.
The reason for removing it is this failure which appeared with the pytest move:
```
tests/st/test_base.py:27: in <module>
from tests.st.utils.utils import (get_ip, ETCD_SCHEME, ETCD_CA, ETCD_CERT,
tests/st/utils/utils.py:89: in <module>
_term_settings = termios.tcgetattr(sys.stdin.fileno())
^^^^^^^^^^^^^^^^^^
/usr/lib/python3.12/site-packages/_pytest/capture.py:247: in fileno
raise UnsupportedOperation("redirected stdin is pseudofile, has no fileno()")
E io.UnsupportedOperation: redirected stdin is pseudofile, has no fileno()
```
This is because pytest redirects stdin, so stdin does not have a terminal.
- Convert uses of `parameterized` to `self.subTest` pattern.
(Sadly, pytest does not support parameterization at the same time as features we get by inheriting
from unittest.TestCase, namely `self.assert...` and `setUp` and `tearDown` methods.)
- Decode subprocess output.
* Don't allocate IPs from IP pools with Disabled status (#11775)
* Bump Envoy Gateway to v1.5.7
* Hack CI
* Add unit tests to cover edge cases in the topology_test.go file and integration-style tests in the syncer.go file.
* Unhack CI
* ClusterNetworkPolicy: support generic protocols (#11804)
* Address PR comments.
* E2E: Splits maglev test into two tests: IPv4 & IPv6 (#11801)
* splits maglev test into v4 and v6 runs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix CNI delete timer to start after acquiring IPAM lock (#11824)
Start the 90-second timeout after acquiring the lock in cmdDel, matching the pattern used in ADD operations. Previously, the timer started before lock acquisition, causing "context deadline exceeded" errors when DELETE operations waited in queue for the lock.
* Replace ippool filters in BIRD template with golang funcs (#11759)
* CNP: pick conformance improvements and enable it (#11833)
* Fix ipamconfigs -> ipamconfigurations (#11839)
* Rename Undefined encap mode to Never to align with v3 (#11831)
* Migrate to Ginkgo v2
* Read coverprofile.out file
* Fix more ginkgo v2 errors and warnings
* Implement manual sharding for felix FVs
* Cleanup felix FV report to filter skipped tests
* Add preflight checks to allow ginkgo v2 only
* Pin calico/go-build with ginkgo v2 only installed
* Collect Multus network-attachment-definitions in cluster diags (#11816)
* Initial plan
* Add Multus network-attachment-definitions collection to calicoctl cluster diags
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
* Generate CRD API manfiests (#11836)
* Felix UT: add tier to calc graph benchmarks (#11855)
Test data was missing tier, resulting in warnings when it was inserted.
* Fix BPF UTs to run on latest kernel (#11837)
* Fix BPF Tests in 25.10 kernel.
* Revert changes to fv
* Fix bpf cleanup FV
* Add debug
* Fix difference between tc and tcx
* [ebpf] - Send tcp rst when a backend is deleted (#11762)
* Add tc programs to send tcp rst for both ipv4 and ipv6.
* Mark ct entries with the removed workload ip to send rst
* Send a rst if there is a CT hit and send_rst flag set.
* Stale NAT entries can occur for 2 reasons. Either the service is deleted or the backend is deleted.
When the service is deleted, we mark the NAT FWD Tcp service entry (when CTLB is disabled) to send a RST.
When the backend is deleted, no change is done in the NAT ct entries. When a next packet hits the CT entry
to a pod, it returns a RST, as a result of which all the entries are flagged as RSTSeen. The connection
dies and the CT entries are deleted after 2 mins.
* Fix build error
* Disable stale NAT conntrack scanner for TCP
* Address copilot review comments
* Address first set of review comments
* Address review comments batch 2
* Address review comments
* Check if the connection is actually reset
* Split TCP spoof test to avoid RST scanner race condition
The "should not be able to spoof TCP" test had two phases in a single
It block. Phase 1 called RemoveFromInfra which sends the workload IP
to the WorkloadRemoveScanner. If the conntrack scanner ran after the
persistent connection was established in Phase 2, it would mark the
CT entries with FlagSendRST, causing the connection to be killed and
the test to fail at expectPongs.
Split into two separate It blocks so Phase 2 starts clean without any
prior WEP removals poisoning the scanner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add tiered-rbac webhook (#11803)
* Skip nftables cache reload for cleaned disabled tables (#11848)
* Skip cache reload for cleaned disabled tables
Avoid calling InvalidateDataplaneCache("post-write") for tables that are disabled and have no chains left in the dataplane. These tables exist only to remove leftover nftables state when switching to iptables mode, so once cleanup is complete there is no need to reload state on every apply cycle (which would spawn nft processes unnecessarily). Cache invalidation still occurs for active tables or disabled tables that still have chains to clean up.
add test
* Update table_test.go
* Add print columns for CRDs in kubectl output (#11805)
* Modernise Go code with go fix (#11864)
* Modernise Go code with go fix
Run Go v1.26's go fix command to apply automatic code modernisations.
* Go mod tidy.
* Update generated files.
go fix removed omitempty from some fields where it could not affect the
output (because a struct cannot be empty by the definition used by the
json encoder). This makes some fields "required" in the openapi schema
that were previously optional but, in practice, they should always
have been there.
* Fix gosimple lint.
* Define LiveMigration resource
The LiveMigration resource enables seamless live migration for KubeVirt and OpenStack, by
associating the source and target pods or VMs. In KubeVirt it's backed by the KubeVirt
VirtualMachineInstanceMigration resource. In OpenStack our Neutron driver creates it and writes it
to the etcd datastore.
Key components:
- LiveMigration type definition (libapiv3) with spec fields for source and
destination workload endpoints.
- K8s backend client using the dynamic client to read KubeVirt VirtualMachineInstanceMigration
resources (read-only: List, Get, Watch). Populate LiveMigrationSpec from VMIM fields (vmiName, uid,
sourcePod), filter VMIMs by phase (TargetReady/Running/Failed) in Get/List/Watch, replace the simple
watch adapter with a state-tracking vmimWatchAdapter that synthesises Added/Deleted events on phase
transitions and suppresses redundant updates.
- Watch adapter (unstructuredWatchAdapter) to bridge the dynamic client's
unstructured objects to the Resource interface expected by the existing
k8sWatcherConverter.
- clientv3 LiveMigration client with standard List/Get/Watch operations.
- Model, namespace, and scheme registrations.
- Felix syncer for LiveMigration.
The K8s backend is deliberately read-only since LiveMigration state is
owned by KubeVirt. The etcdv3 backend stores LiveMigration directly as a
custom resource (unchanged default behavior).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* embed filter for disabled ippools (#11851)
* Update libcalico-go/lib/apis/v3/livemigration.go
Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>
* Update libcalico-go/lib/apis/v3/livemigration.go
Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>
* Get node name from BGPConfig (#11850)
* Remove last vestiges of tier prefixing (#11867)
* Remove last vestiges of tier prefixing
* Fix generation
* Use common TierOrDefault impl
* Add CLAUDE.md with BPF test commands and code structure guide
Documents how to run BPF unit and functional tests, BPF C source
layout in bpf-gpl/, Go user-space packages in bpf/, and dataplane
management in dataplane/linux/.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix FV_BATCH and FV_NUM_BATCHES validation in Felix FV configureManualSharding
* Centralize BPF sub-program management to single source of truth (#11822)
* Initial plan
* Remove hardcoded program counts from attach_test.go
Replace hardcoded BPF program counts with dynamic assertions that verify:
- Programs are loaded (count > 0)
- Program counts increase when adding endpoints
- Specific AttachTypes exist in the Programs map
This eliminates the need to update test expectations every time a new
BPF program is added to the system.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Rename variable for clarity: programsCountBeforeReset -> programsCountBeforeRestart
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Validate against expected program counts using Programs() map
Instead of checking if count > 0, now validate that:
- Count() matches len(Programs()) to ensure consistency
- Programs() map contains the expected AttachTypes (existing checks)
- For delta checks, verify count increased from previous value
This ensures we're validating the actual expected number of programs
loaded, not just that some programs were loaded.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Compute expected program counts based on hook package logic
Added helper functions that calculate the expected number of sub-programs
for each AttachType by mirroring the logic from hook.ProgramsMap.allocateLayout():
- countSubPrograms(): Counts sub-programs for an AttachType based on which
programs should be loaded (considering hasHostConflictProg, hasIPDefrag, hasMaglev)
- expectedProgramCount(): Sums counts across all loaded AttachTypes
- Helper functions: hasHostConflictProg(), hasIPDefrag(), hasMaglev()
Tests now validate against computed expected counts instead of hardcoded values,
eliminating the need to update tests when new BPF programs are added.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Export helper methods from hook package to eliminate duplication
- Export HasHostConflictProg(), HasIPDefrag(), HasMaglev() in hook/load.go
- Update all uses in hook/map.go to use exported methods
- Remove duplicate helper functions from attach_test.go
- Update attach_test.go to use exported hook.AttachType methods
This eliminates code duplication and ensures tests use the same logic
as the production code when determining which programs should be loaded.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Add GetSubProgNames API to eliminate hardcoded program names in tests
- Added GetSubProgNames(hookType) function in hook/map.go
- Updated countSubPrograms() to use GetSubProgNames() instead of hardcoded arrays
- Removed 27 lines of duplicate program name definitions from test file
Now when a new BPF program is added, it only needs to be defined in one place
(tcSubProgNames or xdpSubProgNames in hook/map.go).
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Return copy of slice in GetSubProgNames to prevent mutations
Updated GetSubProgNames to return a copy of the internal arrays instead
of the original slices, preventing callers from accidentally modifying
the internal program name arrays.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Add comment clarifying GetSubProgNames behavior for TC hooks
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Extract sub-program filtering logic into GetApplicableSubProgs API
- Created GetApplicableSubProgs(at, skipIPDefrag) in hook/map.go
- Added SubProgInfo struct to hold sub-program metadata
- Updated allocateLayout to use GetApplicableSubProgs
- Updated countSubPrograms in test to use GetApplicableSubProgs
- Removed 28 lines of duplicate filtering logic from test
The filtering logic for determining which sub-programs should be loaded
is now centralized in a single API that both production code and tests use.
This eliminates duplication and ensures consistency.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Add clarifying comment about debug offset for TC hooks
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Fix BPF UT
* Revert helper methods to private (unexported)
Changed HasHostConflictProg → hasHostConflictProg
Changed HasIPDefrag → hasIPDefrag
Changed HasMaglev → hasMaglev
These methods are only used internally within the hook package (in
GetApplicableSubProgs and loadObj). Since GetApplicableSubProgs now
provides the public API that encapsulates the filtering logic, these
helper methods no longer need to be exported.
This provides better encapsulation and makes it clear that
GetApplicableSubProgs is the intended public interface for determining
which sub-programs should be loaded.
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
* Add traffic distribution support and enable topology-aware routing for Services
* Fix golangci-lint QF1001
* Add unit tests to cover edge cases in the topology_test.go file and integration-style tests in the syncer.go file.
* Address PR comments.
* ClusterNetworkPolicy: support generic protocols (#11804)
* E2E: Splits maglev test into two tests: IPv4 & IPv6 (#11801)
* splits maglev test into v4 and v6 runs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix CNI delete timer to start after acquiring IPAM lock (#11824)
Start the 90-second timeout after acquiring the lock in cmdDel, matching the pattern used in ADD operations. Previously, the timer started before lock acquisition, causing "context deadline exceeded" errors when DELETE operations waited in queue for the lock.
* Replace ippool filters in BIRD template with golang funcs (#11759)
* CNP: pick conformance improvements and enable it (#11833)
* Fix ipamconfigs -> ipamconfigurations (#11839)
* Rename Undefined encap mode to Never to align with v3 (#11831)
* Add a couple of aws-talos runs
* Hack CI
* Unhack CI
* Collect Multus network-attachment-definitions in cluster diags (#11816)
* Initial plan
* Add Multus network-attachment-definitions collection to calicoctl cluster diags
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
* Generate CRD API manfiests (#11836)
* Migrate to Ginkgo v2
* Read coverprofile.out file
* Fix more ginkgo v2 errors and warnings
* Implement manual sharding for felix FVs
* Cleanup felix FV report to filter skipped tests
* Add preflight checks to allow ginkgo v2 only
* Pin calico/go-build with ginkgo v2 only installed
* Felix UT: add tier to calc graph benchmarks (#11855)
Test data was missing tier, resulting in warnings when it was inserted.
* Fix BPF UTs to run on latest kernel (#11837)
* Fix BPF Tests in 25.10 kernel.
* Revert changes to fv
* Fix bpf cleanup FV
* Add debug
* Fix difference between tc and tcx
* [ebpf] - Send tcp rst when a backend is deleted (#11762)
* Add tc programs to send tcp rst for both ipv4 and ipv6.
* Mark ct entries with the removed workload ip to send rst
* Send a rst if there is a CT hit and send_rst flag set.
* Stale NAT entries can occur for 2 reasons. Either the service is deleted or the backend is deleted.
When the service is deleted, we mark the NAT FWD Tcp service entry (when CTLB is disabled) to send a RST.
When the backend is deleted, no change is done in the NAT ct entries. When a next packet hits the CT entry
to a pod, it returns a RST, as a result of which all the entries are flagged as RSTSeen. The connection
dies and the CT entries are deleted after 2 mins.
* Fix build error
* Disable stale NAT conntrack scanner for TCP
* Address copilot review comments
* Address first set of review comments
* Address review comments batch 2
* Address review comments
* Check if the connection is actually reset
* Split TCP spoof test to avoid RST scanner race condition
The "should not be able to spoof TCP" test had two phases in a single
It block. Phase 1 called RemoveFromInfra which sends the workload IP
to the WorkloadRemoveScanner. If the conntrack scanner ran after the
persistent connection was established in Phase 2, it would mark the
CT entries with FlagSendRST, causing the connection to be killed and
the test to fail at expectPongs.
Split into two separate It blocks so Phase 2 starts clean without any
prior WEP removals poisoning the scanner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Address review comment
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
Co-authored-by: sridhar <sridhar@tigera.io>
Co-authored-by: Lucas Sampaio <lucas@tigera.io>
Co-authored-by: Mazdak Nasab <mazdak.nasab@gmail.com>
Co-authored-by: Alex O Regan <alex.oregan@tigera.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: sudheernv <nvsudheerjain@gmail.com>
Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>
Co-authored-by: Lancelot Robson <lancelot.robson@gmail.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
Co-authored-by: Jiawei Huang <jiawei@tigera.io>
Co-authored-by: Shaun Crampton <shaun@tigera.io>
Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Collect more diags when the kind k8st cluster setup fails (#11874)
* Setting TLS 1.3 only ciphers causes API server to fail on startup #11706 (#11812)
* Add configurable TLS minimum version to resolve TLS 1.3 cipher startup failure
This change introduces a TLS_MIN_VERSION environment variable that allows
users to configure the minimum TLS version for the API server and other
components. This resolves the issue where setting TLS 1.3-only ciphers
would cause startup failures due to Go's HTTP/2 cipher validation.
Changes:
- Add ParseTLSVersion() function to crypto/pkg/tls package
- Update NewTLSConfig() to use TLS_MIN_VERSION environment variable
- Update API server to parse and apply TLS_MIN_VERSION
- Add comprehensive tests for TLS version parsing
- Add TLS_CONFIGURATION.md documentation
Supported values for TLS_MIN_VERSION:
- "" or "1.2" (default): TLS 1.2 minimum
- "1.3": TLS 1.3 minimum (allows TLS 1.3-only cipher configurations)
Fixes #11706
* Improve TLS configuration docs with visual diagrams and comparison tables
- Add ASCII diagram showing TLS version and cipher relationships
- Add configuration validation matrix with visual indicators
- Add cipher compatibility diagram for quick reference
- Consolidate sections for better readability
- Reduce from 237 to 159 lines while keeping all essential info
- Add quick reference table at the top
- Improve troubleshooting section with clear comparison table
* Address PR review feedback
- Refactor NewTLSConfig to use local variable for minVersion
- Remove duplicate test from apiserver (already tested in crypto package)
- Remove TLS_CONFIGURATION.md documentation file
Changes per reviewer feedback from caseydavenport
* Fix struct field alignment in tls_test.go
Align struct fields consistently to pass CI formatting checks
* Update felix/CLAUDE.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update felix/CLAUDE.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update felix/CLAUDE.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Improve performance of IP autodetection when there are many IP addrs (#11834)
* Migrate Ginkgo v1 Measure to v2 gmeasure
Reference https://onsi.github.io/ginkgo/MIGRATING_TO_V2#migration-strategy-1.
* Fix go-vet issues
* make fix-all
* make -C libcalico-go gen-files
* Add RBAC for live migration to the tigera-operator chart
* Fix calicoctl UT not to expect the LiveMigration resource
* Switch LiveMigration K8s backend to KubeVirt typed client
Replace the dynamic.Interface / unstructured.Unstructured approach with
the official KubeVirt Go client (kubevirt.io/client-go) to follow the
codebase convention of using structured/typed clients.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* D'oh, make generate
* Replace subTest loops with individual test methods in calicoctl STs
Eliminates the hack of manually calling self.setUp() between subTest
iterations by splitting each parameterized loop into individual test
methods that call _test_* helpers. The test framework now handles
setUp/tearDown naturally for each case.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Rename lib/v3 -> lib/internalapi (#11870)
* Automatic Pin Updates
* Add service index for EndpointSlice lookups in confd route generator
* Decouple resources package from kubevirt.io/client-go to fix e2e flag conflict
The kubevirt.io/client-go/log package registers a -v flag in its init()
that conflicts with klog's -v in binaries that transitively import the
resources package (e.g. the e2e test binary via bgp tests).
Introduce a VMIMClient interface in livemigration.go so the resources
package only imports kubevirt.io/api (types), not kubevirt.io/client-go.
The concrete client-go dependency stays in client.go (k8s package),
which the e2e binary does not reach.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* make generate
* Update LiveMigrationSpec structure per review feedback
Adapt the VMIM-to-LiveMigration conversion and tests to the restructured
LiveMigrationSpec which uses Source/Destination WorkloadEndpointIdentifier
pointers instead of flat fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Start adding Claude instructions. (#11893)
* Add CLAUDE.md
* Add Kubernetes API design skill.
* Add selectors note.
* Address PR review comments: fix typos and update Ubuntu version
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Safely remove finalizers (#11882)
* Update pool sorting logic in pool controller (#11886)
* Update pool sorting logic in pool controller
* Clean up test code
* Add mutating admission policy for tier label, remove from OCP for now (#11890)
* Simplify LiveMigrationSpec.Source to what we really need
There is no anticipated case where we need this to be a selector.
* Include ippools filters in BirdBGPConfig (#11875)
* Fix owner reference test flake caused by Kubernetes GC (#11877)
The "should properly read / write owner references" test was creating a
NetworkPolicy with ownerReferences pointing to a Pod and NetworkSet that
didn't exist in the cluster. With v3 CRDs, owner references are stored
as real Kubernetes ownerReferences (not in an annotation like with v1
CRDs), so the garbage collector would detect the non-existent owners and
delete the NetworkPolicy before the test could verify it.
Fix by creating the actual owner objects (Pod and NetworkSet) before
creating the NetworkPolicy, using their real API server-assigned UIDs in
the owner references. This prevents the GC from deleting the dependent.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add webhooks version to config for node tests (#11900)
* Remove FV tests for crypto package tied to FIPS (#11507)
* Expand CLAUDE.md with architecture, dataplane, networking, and test sections
Add top-level sections covering Felix's major subsystems: architecture
overview, calc graph engine, dataplane manager pattern, iptables/nftables
dataplane, Windows dataplane, networking/routing, and configuration.
Extract a general "Running Tests" section (ut, fv, fv-nft) and keep
BPF-specific test details alongside the BPF dataplane section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix FV container startup race with apiserver.crt bind mount (#11891)
Remove redundant --mount for /tmp/apiserver.crt in FV test containers.
The cert is already accessible via -v /tmp:/tmp, and TLS verification
is skipped. The bind mount caused a race when multiple containers
started concurrently: runc created mount points on the shared host
/tmp, and the second container failed with "file exists".
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* make generate
* Add implement-calico-api-resource Claude Code skill (#11897)
* Add implement-calico-api-resource skill.
Step-by-step guide for plumbing a new API resource through all layers
of the Calico codebase: API types, code generation, backend model,
K8s client, clientv3, apiserver registry/storage, syncers, RBAC,
and manifests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update implement-calico-api-resource skill: split codegen steps, add calicoctl
- Step 3 now runs `cd api && make gen-files` for quick API codegen
so downstream layers can compile during development
- Added Step 18 for calicoctl resource manager registration
- Moved full `make generate` to Step 20 (final step before commit)
- Renumbered RBAC to Step 19
- Updated checklist to include calicoctl
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Address PR review comments
- Remove knownV3Kinds sub-step from Step 8 (list doesn't exist)
- Remove +kubebuilder:resource annotations from List type example
- Emphasize kubebuilder annotations for validation in Step 10,
noting that Go struct validators are not executed for v3 CRDs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Refactor IPAM AllocationAttribute for VM-based handle support (#11894)
* Refactor IPAM AllocationAttribute for VM-based handle support
* Backport Felix calc graph computed data enhancement from Enterprise
@radixo developed this as part of his Istio work, and that is all intended for OSS as well, but we
especially need the calc graph enhancement as soon as possible for live migration.
With Claude's help I've then added unit tests for the new computed data feature:
- ActiveRulesCalculator: tests for AddExtraComputedSelector/
RemoveExtraComputedSelector dispatch to OnComputedSelectorMatch/
OnComputedSelectorMatchStopped callbacks, verifying match, no-match,
removal, and isolation from policy callbacks.
- PolicyResolver: tests for OnEndpointComputedDataUpdate covering
inclusion in flush, nil removal, nil-to-nil no-op, multiple kinds,
and cleanup on endpoint deletion.
- EventSequencer: test for ModelWorkloadEndpointToProto verifying
EndpointComputedData.ApplyTo modifies the proto output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Review markup
* Fix Docker-based make targets in git worktrees (#11898)
* Fix Docker-based make targets when running from a git worktree
In a git worktree, .git is a file pointing to the main repo's
.git/worktrees/<name> directory. When the worktree is mounted into
a Docker container, git commands inside the container fail with
"fatal: not a git repository" because the main .git directory
isn't available.
Detect worktrees by comparing git-dir to git-common-dir, and when
running in a worktree, mount the main .git directory and set
GIT_DIR / GIT_WORK_TREE so git works inside containers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Address review feedback: use absolute git paths and configurable work tree
- Use --absolute-git-dir and $(realpath ...) to guarantee absolute paths
for Docker volume mounts (avoids potential issues with relative paths).
- Make the container work tree path overridable via DOCKER_GIT_WORK_TREE
so Makefiles that mount at a different path (e.g. api/Makefile) can
set it appropriately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Operator CRD update (make generate)
* Enhance Felix route table for elevated priority programming
JIRA ticket: CORE-12272
On its own, this PR does not yet change how Felix programs local routes. `SetRoutes` callers do not
currently set the new `Priority` field, so programmed `Priority` will be 0 for IPv4 routes and 1024
for IPv6 routes, as was the case before this PR.
Upcoming, but separate, live migration work will:
- add Felix configuration fields for "normal" and "elevated" priority values
- change route programming outside of live migration to use the "normal" priority value
- program routes with the "elevated" priority value during a live migration, as part of ensuring the best possible handover
* Add status subresource to KubeControllersConfiguration CRD (#11889)
* Improve tigera-operator helm chart values.yaml and README (#11907)
Co-authored-by: Ludwig <tommludwig@icloud.com>
Co-authored-by: Tom Ludwig <83090745+tom-ludwig@users.noreply.github.com>
* Trim PR template and simplify cherry-pick headings (#11905)
* Trim PR template; update cherry-pick script headings
Simplify the PR template to just a description comment and the
release-note block. Remove the verbose sections (Description,
Related issues/PRs, Todos, Reminder for the reviewer) that were
rarely filled in.
Update cherry-pick-pull to use bold text instead of markdown headings
for cherry-pick history and multi-PR separators. Also improve re-pick
handling to absorb old bullet points (both old ## and new ** formats)
into the header so picks stack correctly with newest at top.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add bold Release note label above the code block
Without a heading, the release-note block looked like an anonymous
code block in the PR UI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Factor out build-pr-description() for testability
Extract the PR title/body/label generation from make-a-pr() into a
pure text-processing function build-pr-description() that can be
tested with canned input (no gh/git calls). Add
cherry-pick-pull_test.sh with 9 test cases covering fresh picks,
re-picks (old/new header formats, stacked bullets), multi-PR picks,
cross-repo ref prefixing, label filtering, section stripping, and
META_BLOCK inclusion.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Handle indented sub-bullets in cherry-pick history parsing
The old awk patterns only captured top-level "- Pick onto ..." bullets.
When re-picking a multi-PR cherry-pick (which uses indented sub-bullets
like " - org/repo#123"), those indented lines were left in the body
instead of being absorbed into the history header.
Rewrite both awk scripts to track an in_hist state and capture/skip all
bullet lines (top-level or indented) within the history block. Add a
test case for re-picking a multi-PR pick with indented sub-bullets.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Simplify cherry-pick history handling; use exact test assertions
Replace the awk-based history parsing with a simple first-line check:
if the body starts with a cherry-pick history header, strip it (old
bullets stay in place beneath ours); otherwise add a blank line
separator.
Rename output variables from BUILD_ to NEW_PR_ for clarity. Rewrite
all test assertions to use assert_equals on the full output strings
(with readable multi-line expected values) instead of
assert_contains, catching any whitespace formatting issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add shell-test target to hack/Makefile for CI
Add a shell-test make target that runs cherry-pick-pull_test.sh, and
include it in the ci target so it runs in the "Tools (hack directory)"
Semaphore CI block.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Clarify PULL_LABELS and PULLLINK doc comments
Address review nits: clarify that PULL_LABELS elements contain
newline-separated labels (one element per PR), and rename "markdown
links" to "PR links" for PULLLINK.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Revert k8s.io/client-go version change in go.mod
It appears this wasn't actually necessary - perhaps a mistake introduced by Claude using
non-containerized go calls.
* Fix deps generation to take account of replacements
Addresses the wrong-looking deps file changes like
[here](https://github.com/projectcalico/calico/pull/11868/changes#diff-e80a705fd1859f7941b49c10e40de0b6c44d20f50776a20622dc7593c791d766R46)
and @fasaxc's comment about that
[here](https://github.com/projectcalico/calico/pull/11868#discussion_r2841071924)
* make generate
* Review markups
* Add isolated customer environments calc graph benchmark (#11866)
* Add isolated customer environments calc graph benchmark
Simulates a multi-tenant SaaS cluster with 1k/10k/100k identical
namespaces, each containing frontend, backend, and database pods
with per-deployment namespaced Calico NetworkPolicies, plus a
system namespace with monitoring pods that can reach all tenants.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scatter local pods across namespaces in isolated customer benchmark
Use 100 pods/node as baseline across all variants. Each pod is
assigned to exactly one host with no overlap. Local pods are
scattered uniformly across namespaces via a stride, simulating
realistic K8s scheduling rather than clumping all local pods into
a few namespaces.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update isolated customer benchmark with realistic SaaS label cardinalities
Replace frontend/backend/database pod templates with uniform pods per
namespace and a configurable podsPerNamespace parameter. Model pod labels
(15 per pod) and namespace labels (3 per namespace) on a real SaaS
deployment with realistic value cardinalities.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use 1 pod per namespace in isolated customer benchmark
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* More `make generate` changes that I somehow missed just now
* Don't store label restrictions for every selector. (#11846)
* Don't store label restrictions for every selector.
Instead, cache only the most recently used label restrictions
in a package-level variable. The label restrictions are used
several times when adding a selector to the index (which happens
on a single thread) so caching one instance avoids almost all
recalculation.
Storing a map per selector really adds up if there are tens of
thousands of selectors active. In addition, some uses of selectors
don't need the restrictions, so we save calculation there.
* Wrap LabelRestrictions in a struct to protect the cached map.
Change LabelRestrictions from a map type alias to a struct with a
private map field, exposing All() and Get() methods. This prevents
callers from accidentally modifying the cached label restrictions map.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add tests for LabelRestrictions cache hit/miss behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add some new e2e tests (#11892)
* fix: add nil check in AddressesAsCIDRs to prevent SIGSEGV (#11602)
When ExternalIP is empty in Kubernetes, a nil ip.Addr can end up in
the Addresses slice of l3rrNodeInfo. The existing cleanup logic only
removed emptyV4Addr and emptyV6Addr (zero-valued structs), but not nil
interface values. This caused a panic (SIGSEGV) when AsCIDR() was
called on a nil address at l3_route_resolver.go:166.
This fix adds a nil check to the cleanup loop to filter out nil
addresses before iterating and calling AsCIDR().
Fixes #11384
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix(windows): also print 'bin_dirs' value on uninstall-calico-hpc.ps1
Print new 'bin_dirs' key used in containerd v2.1+
* [UI-256] update calico icons (#11685)
* update calico icons
* fix linting
* prevent nil pointer dereference in handleBlockUpdate (#11913)
* prevent nil pointer dereference in handleBlockUpdate for blocks with nil affinity
* Add apt publishing framework to release tool
* Add Suite field to repository Releases file
* Run gofmt
* Run gofumpt
* Handle sourcesFile and oddly named packages better
* Fix error strings; handle empty outputDir parameter
* Fix-all
* assert correct error value in generating filteres for ippools (#11914)
* fix: advertise /32 LB IPs assigned from IPPool via BGP (CI-1944) (#11917)
* Update to new operator CRDs location (#11918)
* Update CLAUDE.md with BPF build and test instructions
Add guidance on make build-bpf, FOCUS-filtered BPF unit tests,
and TestPrecompiledBinariesAreLoadable for kernel verifier checks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix/simplify k8s backend and add VMIM progression tests
The use of vmimWatchAdapter was wrong because watches can break and need to be restarted, which
means that Watch() is called from scratch again, and then we wouldn't have any state describing
whether we've previously emitted a LiveMigration.
Instead, for watching, we can just:
1. always pass through the updateType that we see for the VMIM event
2. when the VMIM is in a state that should not appear as a LiveMigration, represent that by
emitting a nil Value.
When Typha is the immediate downstream - as it usually is - it understands those, deduplicates no-op
"updates", converts nil values to deletions, and recalculates the updateType accordingly. When
Felix is the immediate downstream, Felix's `dedupebuffer` does the same thing.
Update existing UTs accordingly - e.g. expect Modified with nil value instead of Deleted - and add
new UTs for the typical VMIM progressions that we expect to see.
* Regenerate e2e/deps.txt
* Avoid our client package pulling in kubevirt.io/client-go
This is like the problem already described in a53eb0b5c3, namely:
Decouple resources package from kubevirt.io/client-go to fix e2e flag conflict
The kubevirt.io/client-go/log package registers a -v flag in its init()
that conflicts with klog's -v in binaries that transitively import the
resources package (e.g. the e2e test binary via bgp tests).
Introduce a VMIMClient interface in livemigration.go so the resources
package only imports kubevirt.io/api (types), not kubevirt.io/client-go.
The concrete client-go dependency stays in client.go (k8s package),
which the e2e binary does not reach.
But worse, because now, since cf44126, our e2e test code directly pulls in
"github.com/projectcalico/calico/libcalico-go/lib/clientv3" (in e2e/pkg/tests/ipam/ipam_gc.go).
I think the only sustainable solution (assuming we can't rely on kubevirt fixing their client-go
code) is to make our k8s backend client not pull in kubevirt by default, and instead add a dedicated
"github.com/projectcalico/calico/libcalico-go/lib/backend/k8s/kubevirt" package to enable that at
runtime. Interested main programs such as Felix will need to call `kubevirt.Enable(...)` to add
VirtualMachineInstanceMigration / LiveMigration handling to the k8s backend client.
* Revert "Avoid our client package pulling in kubevirt.io/client-go"
This reverts commit fa7c4a3520857b872799673bf044ff2bc9731ff0.
* Use Tigera fork of kubevirt/client-go
* Regenerate deps files
* Add .claude to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix persistent connection teardown race in BPF spoof test
When Stop() is called on a PersistentConnection, it creates a loop file
to signal test-connection to exit, then waits for the process. However,
the tryLoopFile loop checks the loop file only in ls.Next(), which comes
AFTER the Receive() call. If the TCP connection gets reset (e.g. due to
Felix reprogramming BPF state after a WEP reconfiguration), Receive()
returns a non-timeout error and hits log.Fatal before ever checking the
loop file. This causes test-connection to exit with code 1, which makes
Stop() fail the test even though all assertions passed.
Fix: in the Receive() error handler, check if the loop file exists
(meaning we were asked to stop). If so, exit cleanly instead of fatally.
* Update felix/fv/test-connection/test-connection.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Guard loop-file shutdown check on sentInitial
Only treat a receive error as a clean shutdown when the initial
exchange has already completed. Before this fix, an early connection
failure could be misinterpreted as a requested stop since the loop
file is expected to exist at startup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update DNS nearer the start, and add retries to apt-get installs
* Add direct-mapped cache to uniquelabels.Make for repeated inputs (#11854)
* Add direct-mapped cache to uniquelabels.Make for repeated inputs
It is common for Make to be called in quick succession with the same
input map from different call sites. This adds a small (128-entry)
direct-mapped cache that avoids redundant handleMap allocations when
the input matches a recently-computed Map.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Improve cache test coverage and fix EquivalentTo nil handling
EquivalentTo now distinguishes Nil from Empty, matching the Map type's
design. Add tests for cache misses, eviction, hash collisions,
concurrent access, and EquivalentTo edge cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Rename makeCache to recentMapCache for clarity
The old name read like a function call rather than a type name.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Send HostMetadata to BPF KubeProxy (#11817)
Sends `*proto.HostMetadataV4V6Update` and `*proto.HostMetadataV4V6Remove` to the BPF KP, to facilitate host-aware functionality.
* Index routetable structs on kernelRouteKey instead of just CIDR
Relatedly:
- Change `RouteRemove` to take a `Target` (like `RouteUpdate` and `SetRoutes`) instead of just
CIDR. (Only the key fields are significant in `RouteRemove` calls.)
Todo:
- Test adding/removing routes with same CIDR and different priorities at the same time.
- Do we need to handle different priorities in `conntrackTracker` calls?
* Firewall attempt to Watch LiveMigrations on v3 API
* Add comment for why v3 API Watch is disallowed
* Update v3 client e2e tests for LiveMigration
* Add webhooks version command and hook into hashreleases (#11941)
* [CORE-12132] Move windows e2es to azr-aso provisioner
change semaphore machine to f1
add dummy needrestart script when needed
add installer: operator configuration
add new RUN_LOCAL_TESTS option to e2e tests and use them on the windows pipeline
use azr-aso provisioner for iptables run
add env var to fix porter pods
fix conncheck TCP command for Windows
increase timeouts on tests for Windows
clean up timeouts and comment about long one for windows image pulling
* [CORE-12378] fix(QoS): Use QdiscReplace() instead of QdiscAdd()
Use QdiscReplace() instead of QdiscAdd() so that adding the TBF
qdiscs needed for QoS controls with tc does not error out when
there is an existing non-default (handle != 0) qdisc on the
interface for any reason.
Add a test case to the felix FVs to cover this.
Also, enable felix debug logging on the QoS felix FVs, and
remove overzealous Skip() that was resulting in no test cases
running on iptables/nftables modes.
* Fix BPF FV flake: sync NAT maps on all felixes before connecting (#11939)
The "backend replaced" test checked only felix-1's NAT maps before
starting a persistent connection, but felix-0's CTLB cgroup program
(shared across all containers in the FV environment) reads from
felix-0's maps. If felix-0 hadn't synced yet, the CTLB program
returned NAT_NO_BACKEND → EPERM, causing test-connection to exit
before deleting the loop file.
Check all felixes' NAT maps so both the CTLB and TC paths are ready.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Bump base image to UBI 9 in calico-node component (#11860)
* Bump base image to UBI 9 in calico-node component
* Refactor iptables-legacy installation
* Fixing symbolic links and xtables-legacy binary
* - Install ipset and nftables from ubi9 repo
- Merge multiple microdnf installs
* Remove outdated comments and obsolete grant permissions; improve code organization.
* Fix iptables-legacy libs
* Move templates to templates dir
* Use RouteKey in RouteTable API
- Rename kernelRouteKey to RouteKey and make it part of the API.
- In Target use an embedded RouteKey instead of equivalent individual fields.
- Update RouteRemove to take RouteKey instead of Target.
* CI: Tell Semaphore to upload logs as artifacts when truncating (#11946)
* Felix FV CI: mark failed per-VM logs and upload them to transient GCS (#11940)
* Initial plan
* feat(ci): rename failed felix FV VM logs and upload to GCS
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
* fix(ci): harden failed felix FV log summary and GCS upload guard
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
* fix(ci): scope failed FV log uploads by job tag
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
* fix(ci): make monitor tail retry-safe and summarize actual failed log path
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
* Fix IPv6 RouteRemove not normalizing Priority 0 to 1024
RouteUpdate and SetRoutes normalize IPv6 Priority 0 to 1024 (since the
kernel treats 0 as "use default" which is 1024 for IPv6), but RouteRemove
did not. This meant RouteRemove with Priority 0 would fail to find and
remove IPv6 routes that were stored with the normalized key Priority 1024.
Extract normalizeRouteKey from routeKeyForTarget and call it in
RouteRemove as well.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add unit tests for routes with different priorities sharing the same CIDR
Include Priority in mock netlink KeyForRoute so routes with the same CIDR
but different priorities get distinct keys in the mock dataplane. Update
all existing route key assertions accordingly.
Add 11 new tests covering multi-priority route scenarios: adding and
removing routes at different priorities, resync behavior, and stale route
cleanup. Add a test for IPv6 RouteRemove Priority normalization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Throw away result of .Close()
* Fix-all
* Add `ssh-key add` to cni-plugin push-images pipeline
* Add `ssh-key add` to all push-images pipelines
* Remove doubles
* Use net.JoinHostPort() for IPv6-safe host:port formatting
Go 1.26 tightened net/url parsing to reject bare IPv6 addresses in
URLs (issue #75223). URLs like http://2001:db8::1:9099/path were
silently accepted in Go 1.25 but now correctly return a parse error.
This broke all IPv6 BPF FV tests which construct health check URLs
with unbracketed IPv6 addresses.
Replace string concatenation with net.JoinHostPort() which correctly
brackets IPv6 addresses ([::1]:9099) and leaves IPv4 unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix IPv6 DNAT/SNAT rule formatting in iptables and nftables backends
Both ip6tables and nftables require bracketed IPv6 addresses when a
port is present (e.g., [2001:db8::1]:80). The DNAT and SNAT actions
were using plain string formatting which produces invalid rules for
IPv6. Use net.JoinHostPort() which correctly brackets IPv6 addresses.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add live migration routing sequence unit tests
Test the two key live migration subcases where Felix-managed routes
coexist with external BIRD routes at different priorities for the same
VM IP:
(a) Source host: Felix local route at normal priority, BIRD remote route
appears at elevated priority, Felix removes its route, BIRD reverts
to normal priority.
(b) Destination host: BIRD remote route at normal priority, Felix
programs local route at elevated priority, BIRD route removed, Felix
reverts to normal priority.
Verifies that resync never disturbs the external BIRD routes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Document conntrack tracker limitation with multiple route priorities
Add a detailed comment explaining the interaction between conntrack
cleanup and multiple route priorities. The ConntrackCleanupManager is
keyed on CIDR (one owner per CIDR) while routes are keyed on RouteKey
(CIDR + Priority). For the live migration use case this is safe: Felix
only manages one route per CIDR at a time, with the coexisting BIRD
route being external to the tracker.
On the source host, the conntrack flush when Felix removes its local
route is correct: the VM is leaving, so stale conntrack entries should
be flushed to force policy re-evaluation (the return path may now
traverse different HostEndpoints on a different host).
On the destination host, no flush is triggered because BIRD's
pre-existing route was never tracked.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add VirtualMachineInstanceMigration RBAC to non-operator manifests
Commit c9b6f2be14 added RBAC for watching kubevirt.io
VirtualMachineInstanceMigration resources to the operator manifests,
but missed the non-operator (Helm chart) manifests. Add the same
get/list/watch permissions to the calico-node ClusterRole in the
calico chart template and regenerate manifests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Move VMIM RBAC rule outside network==calico gate
Felix's KDD syncer always syncs VirtualMachineInstanceMigration
resources regardless of network/IPAM mode, so the RBAC rule must
be gated only on datastore==kubernetes, not additionally on
network==calico. This adds the rule to canal and policy-only
manifests that were previously missing it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add a few more tiered RBAC e2e tests (#11895)
* Improve comment about live migration and conntrack state cleanup
* IPAM support for KubeVirt IP persistence (#11865)
* Add KubeVirt VM live migration IPAM support
Implement VM-aware IPAM allocation that preserves IP addresses across
KubeVirt virtual machine live migrations. Key changes:
- Add KubeVirt VMI client library for tracking VM identity and migrations
- Extend CNI IPAM plugin with VM-aware allocation and release logic
- Add IPAMConfig resource with MaxAllocPerIPVersion for VM IP limits
- Support handle-based IP reuse to maintain stable IPs during migration
- Add comprehensive unit and integration tests for kubevirt IPAM flows
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix comment
* Added filter for non VM pod
* Review Markups
* Update cni-plugin/pkg/ipamplugin/ipam_plugin.go
Co-authored-by: Shaun Crampton <shaun@tigera.io>
* Review Markups II
* Fix static checks
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shaun Crampton <shaun@tigera.io>
* fix(qos tests): remove unnecessary waits for routes on QoS felix FVs
Remove unnecessary `Eventually()` calls that were waiting for the
workload interface to be present in the output of `ip r` on every
workload update (mostly changing QoS configs).
* Added cali bot trigger (#11849)
* Added cali bot trigger
* code review fixes
* Rename cali-bot-trigger.yml to calico-github-issues-bot-trigger.yml
* Add Claude Code skill for reproducing CI failures on GCP VMs (#11968)
* Add Claude Code skill for reproducing CI failures on GCP VMs
Documents the process of creating a GCP VM that matches the CI
environment (image family, Docker version, sysctl settings) to
reproduce kernel-dependent test failures locally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix GCP skill to match CI: machine type, disk size, Docker repo format
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Pedro Coutinho <coutinhop@users.noreply.github.com>
Co-authored-by: Nell Jerram <nell@tigera.io>
Co-authored-by: Mazdak Nasab <mazdak.nasab@gmail.com>
Co-authored-by: marvin-tigera <marvin-tigera@users.noreply.github.com>
Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>
Co-authored-by: Lucas Sampaio <lucas@tigera.io>
Co-authored-by: Pedro Coutinho <pedro@tigera.io>
Co-authored-by: Alex O Regan <alex.oregan@tigera.io>
Co-authored-by: Shaun Crampton <shaun@tigera.io>
Co-authored-by: Casey Davenport <davenport.cas@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Alex Harford <alex.harford@tigera.io>
Co-authored-by: MichalFupso <michal@tigera.io>
Co-authored-by: Brian McMahon <brianmcmahon135@gmail.com>
Co-authored-by: Steve Gao <steve@tigera.io>
Co-authored-by: tuti <tuti@tigera.io>
Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
Co-authored-by: marvin-tigera <marvin@projectcalico.io>
Co-authored-by: Seth Malaki <seth@tigera.io>
Co-authored-by: Lancelot Robson <lancelot.robson@gmail.com>
Co-authored-by: Seth Malaki <seth@projectcalico.org>
Co-authored-by: Tomas Hruby <49207409+tomastigera@users.noreply.github.com>
Co-authored-by: sudheernv <nvsudheerjain@gmail.com>
Co-authored-by: Lance Robson <lance@tigera.io>
Co-authored-by: Jiawei Huang <jiawei@tigera.io>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: haojiwu <haojiwu@gmail.com>
Co-authored-by: sridhar <sridhar@tigera.io>
Co-authored-by: KameHameHa21110 <abhinavgovind23@gmail.com>
Co-authored-by: Brian Stack <brian@render.com>
Co-authored-by: Song Jiang <song@tigera.io>
Co-authored-by: Walter Neto <walter@tigera.io>
Co-authored-by: Ludwig <tommludwig@icloud.com>
Co-authored-by: Tom Ludwig <83090745+tom-ludwig@users.noreply.github.com>
Co-authored-by: lif <1835304752@qq.com>
Co-authored-by: Daniel Fox <dan.fox@tigera.io>
Co-authored-by: Oleksandr Skoryk <skorichok88@gmail.com>
* Whisker - policy filter version 3 * update filters to latest backend changes * Whisker - update table styles * Whisker - update table styles
skoryk-oleksandr
approved these changes
Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
add new filters
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*label.docs-pr-required: This change requires a change to the documentation that has not been completed yet.docs-completed: This change has all necessary documentation completed.docs-not-required: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*label.release-note-required: This PR has user-facing changes. Most PRs should have this label.release-note-not-required: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.