Skip to content

Whisker UI new features#11310

Merged
ronanc-tigera merged 30 commits intoprojectcalico:masterfrom
ronanc-tigera:whisker-ui-new-features
Mar 19, 2026
Merged

Whisker UI new features#11310
ronanc-tigera merged 30 commits intoprojectcalico:masterfrom
ronanc-tigera:whisker-ui-new-features

Conversation

@ronanc-tigera
Copy link
Contributor

@ronanc-tigera ronanc-tigera commented Nov 4, 2025

Description

add new filters

  • policy
  • reporter
  • action
  • start time
  • theme changes

whisker-no-policy-filter
whisker-policy-filter-18-3-26

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

TBD

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

ronanc-tigera and others added 19 commits September 9, 2025 12:54
…ter-ui-only

[TSLA-9971] policy filter UI only
…-with-master

Whisker UI new features sync with master
…olumn

[TSLA-9973] reporter column and filter
* Whisker - Start time filter

* update start time logic
@ronanc-tigera ronanc-tigera requested a review from a team as a code owner November 4, 2025 10:18
@marvin-tigera marvin-tigera added this to the Calico v3.32.0 milestone Nov 4, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Nov 4, 2025
* removed old policy filter

* fix filter transform issue

* add test coverage
* start time filter improvements

* improve accessibility for start time select
@github-actions
Copy link
Contributor

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale Issues without recent activity label Jan 31, 2026
* Fix rendering of NatPortRange in nftables mode

* Stop consuming redundant HostMetadata message in no-encap manager (#11737)

* Add UTs with fully random

* Add FV

* Fix CI against OpenStack Yoga, by removing it

Yoga has been "unmaintained" - which is OpenStack terminology for a state similar to EoL - since
October 2024, and is no longer of interest to our OpenStack customers.  The CI against Yoga recently
broke when we updated our Semaphore platform from Ubuntu 20.04 to 22.04.  This was briefly addressed
by https://github.com/projectcalico/calico/commit/29d69fa9324e85f0270e0dc02358e9d451785ec4, but
since then there has been further breakage, which does not look easy to fix - fundamentally because
upstream Yoga-level code was never developed and tested against Ubuntu 22.04.

* For VM-based tests on Jammy pin docker-buildx-plugin (#11743)

* For VM-based tests on Jammy pin docker-buildx-plugin

We need to pin because download.docker.com now has a newer buildx that tries to use an API version
that is too new for the Docker daemon, causing this error:
```
docker buildx build --load --platform=linux/amd64 --pull --build-arg UBI_IMAGE=registry.access.redhat.com/ubi9/ubi-minimal:latest --build-arg GIT_VERSION=v3.32.0-0.dev-643-g38568836d2ac --build-arg CALICO_BASE=calico/base:ubi9-1769122535 --build-arg BPFTOOL_IMAGE=calico/bpftool:v7.5.0 --network=host --build-arg BIN_DIR=dist/bin --build-arg BIRD_IMAGE=calico/bird:v0.3.3-211-g9111ec3c-amd64 --build-arg GIT_VERSION=v3.32.0-0.dev-643-g38568836d2ac -t node:latest-amd64 -f ./Dockerfile.amd64 .
ERROR: failed to build: Error response from daemon: client version 1.52 is too new. Maximum supported API version is 1.41: driver not connecting
make[1]: Leaving directory '/home/ubuntu/calico/node'
make[1]: *** [Makefile:268: .calico_node.created-amd64] Error 1
make: Leaving directory '/home/ubuntu/calico/node'
make: *** [Makefile:440: k8s-test] Error 2
```

* Spurious change to trigger node CI

* Revert "Spurious change to trigger node CI"

This reverts commit 46fdd376d925289648e63c57c03904c74c9a52fa.

Seems we didn't need this to trigger the CI.

* Remove CRDs from tigera-operator helm chart (#11727)

* Add traffic distribution support and enable topology-aware routing for Services

* Fix golangci-lint QF1001

* Use gcloud credential helper to login to GCR (#11752)

* Ability to use projectcalico.org/v3 custom resource definitions  (#10447)

* [windows] ASO: add support for nftables and BPF dataplanes (on linux nodes)

Add support for the nftables and BPF dataplanes on linux nodes
to the ASO test infra.

Remove docker installation as only containerd
is necessary.

Use a config yaml for kubeadm init instead of CLI flags.

* fix kubeadm config yaml

* replace docker commands with ctr in windows cni-plugin FVs

* Fix chart target (#11761)

* [BPF]  Maglev Prometheus Metrics:  Connection counts (#11660)

* Export maglev conntracks as prometheus metrics


Co-authored-by: Shaun Crampton <shaun@tigera.io>

* Update tests to use ubuntu 25.10 instead of 25.04 (#11763)

Co-authored-by: Casey Davenport <davenport.cas@gmail.com>

* Initial plan

* Convert Python 2 code to Python 3 in node/tests/k8st

Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>

* Update test container to Docker 25 and Python 3

Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>

* Fix generated files. (#11766)

* Add --break-system-packages

* Unpin

* Repin to current versions

* Migrate from nose to pytest test runner

Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>

* Fix typo in CNP CRD. (#11768)

* Fix typo in CNP CRD.

* Pin upstream CNP CRD to explicit commit, was floating.

* Python test code fixes

- Correct import path for `utils.utils`
- Output from subprocesses needs `.decode()`
- Avoid pytest running _TestLocalBGPPeer in its own right.

* Update node/tests/k8st/tests/test_bgp_filter.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update node/tests/k8st/tests/test_bgp_filter.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update node/tests/k8st/tests/test_bgp_filter.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove unused cluster_route_regex_v4 variable

Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>

* Python test code fixes

- Correct import path for `utils.utils`
- Output from subprocesses needs `.decode()`
- Avoid pytest running _TestLocalBGPPeer in its own right.

* Update to Go 1.25.7

* fix(windows): rename ASO env vars

* Bump CALICO_BASE_VER to ubi9-1770247388

* Run some tests against projectcalico.org/v3 API group (#11758)

* Add dependabot config to update golang.org/x/* libraries (#11776)

* Add dependabot config to update golang.org/x/* libraries

* Add WaitForCloseWithDeadline utility to wait for a channel to close

* Remove profile CRD, as it is unused (#11792)

* Run ci target instead of fv directly (#11793)

* Turn off dependabot

* fix: return images marked as release if not the same as BUILD_IMAGES (#11760)

* Fix app-policy UTs not running (#11795)

* Fix app-policy UTs not running

* Update app-policy/Makefile

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [BPF] Fix propafation of ctx->fwd

Since ctx->fwd is not in state, it is only valid within a single
program. We do set it to true early on, but that does not survive and we
thus may not redirect when we could redirect.

* [BPF] fix unhandled return value from bpf_fib_lookup

* Allow link-local even if HEP rpf check returns BPF_FIB_LKUP_RET_NOT_FWDED (#11781)

* Systemd-resolvd restrart in a node with bpf dataplane results in host networking going down.
When we restart systemd-resolvd, it resets the interface config, routes and the interface loses IP.
systemd-resolvd then uses DHCP to get an ip address. DHCP request is broadcasted and the DHCP offer
comes from 169.254.169.254 which is a link-local address. This will be dropped at the ingress of the
host interface as rpf_check for 169.254.169.254 fails. As a result, the system never comes up.

The DHCP offer is allowed at times, if there was a DHCP renewal just few seconds before systemd restart.
This results in a conntrack and the DHCP offer gets allowed. Without conntrack, it gets dropped.

However if we try get the ip using dhclient, it works even in the broken state. systemd uses UDP socket to
get the DHCP offer and it doesn't get it as we drop because of rpf. dhclient uses a AF_PACKET socket to
read the DHCP packets and it gets it even if tc program drops it. If I attach a xdp program to drop the dhcp
packets, even dhclient cannot read it. So even if a packet is dropped by tc program, an application using
AF_PACKET can snoop the packet.

Fix - At hep_rpf_check, we use fib_lookup to check if the route exists. In this case fib_lookup returns
BPF_FIB_LKUP_RET_NOT_FWDED and we drop the packet. The fix is to allow the packet even if fib_lookup returns
BPF_FIB_LKUP_RET_NOT_FWDED but the source ip is a link local IP.

* Address review comments

* Automatic Pin Updates

* Use same calico/test image for calicoctl ST as for node

This also means converting calicoctl ST from Python 2 to Python 3, and using pytest instead of
nosetests

This work was cherry-picked from the Copilot PR at
https://github.com/projectcalico/calico/pull/11782:

- Update Makefiles to invoke pytest instead of nosetests
- Convert Python 2 syntax to Python 3:
  - print statements to print() functions
  - dict.iteritems() to dict.items()
  - xrange() to range()
  - Remove cmp() usage (use equality checks)
  - Fix metaclass syntax
  - Fix bytes/string handling for hashlib
- Replace nose imports with pytest equivalents
- Replace @attr decorator with pytest.mark

Then I added the following tweaks and fixes:

- Update Makefile note about how to avoid running slow tests.

- Fix relative imports.

- Remove termios stuff.

  This dates back to commit de8356294fa295e1fcc963c5e88d7fce4d14749f, 2016, and looks bogus to me.
  None of the commands we currently run look like they should be "messing with terminal settings",
  so let's remove this and see if anything else breaks.

  The reason for removing it is this failure which appeared with the pytest move:

  ```
  tests/st/test_base.py:27: in <module>
      from tests.st.utils.utils import (get_ip, ETCD_SCHEME, ETCD_CA, ETCD_CERT,
  tests/st/utils/utils.py:89: in <module>
      _term_settings = termios.tcgetattr(sys.stdin.fileno())
                                         ^^^^^^^^^^^^^^^^^^
  /usr/lib/python3.12/site-packages/_pytest/capture.py:247: in fileno
      raise UnsupportedOperation("redirected stdin is pseudofile, has no fileno()")
  E   io.UnsupportedOperation: redirected stdin is pseudofile, has no fileno()
  ```

  This is because pytest redirects stdin, so stdin does not have a terminal.

- Convert uses of `parameterized` to `self.subTest` pattern.

  (Sadly, pytest does not support parameterization at the same time as features we get by inheriting
  from unittest.TestCase, namely `self.assert...` and `setUp` and `tearDown` methods.)

- Decode subprocess output.

* Don't allocate IPs from IP pools with Disabled status (#11775)

* Bump Envoy Gateway to v1.5.7

* Hack CI

* Add unit tests to cover edge cases in the topology_test.go file and integration-style tests in the syncer.go file.

* Unhack CI

* ClusterNetworkPolicy: support generic protocols (#11804)

* Address PR comments.

* E2E: Splits maglev test into two tests: IPv4 & IPv6 (#11801)

* splits maglev test into v4 and v6 runs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix CNI delete timer to start after acquiring IPAM lock (#11824)

Start the 90-second timeout after acquiring the lock in cmdDel, matching the pattern used in ADD operations. Previously, the timer started before lock acquisition, causing "context deadline exceeded" errors when DELETE operations waited in queue for the lock.

* Replace ippool filters in BIRD template with golang funcs (#11759)

* CNP: pick conformance improvements and enable it (#11833)

* Fix ipamconfigs -> ipamconfigurations (#11839)

* Rename Undefined encap mode to Never to align with v3 (#11831)

* Migrate to Ginkgo v2

* Read coverprofile.out file

* Fix more ginkgo v2 errors and warnings

* Implement manual sharding for felix FVs

* Cleanup felix FV report to filter skipped tests

* Add preflight checks to allow ginkgo v2 only

* Pin calico/go-build with ginkgo v2 only installed

* Collect Multus network-attachment-definitions in cluster diags (#11816)

* Initial plan

* Add Multus network-attachment-definitions collection to calicoctl cluster diags

Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

* Generate CRD API manfiests (#11836)

* Felix UT: add tier to calc graph benchmarks (#11855)

Test data was missing tier, resulting in warnings when it was inserted.

* Fix BPF UTs to run on latest kernel (#11837)

* Fix BPF Tests in 25.10 kernel.

* Revert changes to fv

* Fix bpf cleanup FV

* Add debug

* Fix difference between tc and tcx

* [ebpf] - Send tcp rst when a backend is deleted (#11762)

* Add tc programs to send tcp rst for both ipv4 and ipv6.

* Mark ct entries with the removed workload ip to send rst

* Send a rst if there is a CT hit and send_rst flag set.

* Stale NAT entries can occur for 2 reasons. Either the service is deleted or the backend is deleted.
When the service is deleted, we mark the NAT FWD Tcp service entry (when CTLB is disabled) to send a RST.
When the backend is deleted, no change is done in the NAT ct entries. When a next packet hits the CT entry
to a pod, it returns a RST, as a result of which all the entries are flagged as RSTSeen. The connection
dies and the CT entries are deleted after 2 mins.

* Fix build error

* Disable stale NAT conntrack scanner for TCP

* Address copilot review comments

* Address first set of review comments

* Address review comments batch 2

* Address review comments

* Check if the connection is actually reset

* Split TCP spoof test to avoid RST scanner race condition

The "should not be able to spoof TCP" test had two phases in a single
It block. Phase 1 called RemoveFromInfra which sends the workload IP
to the WorkloadRemoveScanner. If the conntrack scanner ran after the
persistent connection was established in Phase 2, it would mark the
CT entries with FlagSendRST, causing the connection to be killed and
the test to fail at expectPongs.

Split into two separate It blocks so Phase 2 starts clean without any
prior WEP removals poisoning the scanner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add tiered-rbac webhook (#11803)

* Skip nftables cache reload for cleaned disabled tables (#11848)

* Skip cache reload for cleaned disabled tables

Avoid calling InvalidateDataplaneCache("post-write") for tables that are disabled and have no chains left in the dataplane. These tables exist only to remove leftover nftables state when switching to iptables mode, so once cleanup is complete there is no need to reload state on every apply cycle (which would spawn nft processes unnecessarily). Cache invalidation still occurs for active tables or disabled tables that still have chains to clean up.

add test

* Update table_test.go

* Add print columns for CRDs in kubectl output (#11805)

* Modernise Go code with go fix (#11864)

* Modernise Go code with go fix

Run Go v1.26's go fix command to apply automatic code modernisations.

* Go mod tidy.

* Update generated files.

go fix removed omitempty from some fields where it could not affect the
output (because a struct cannot be empty by the definition used by the
json encoder).  This makes some fields "required" in the openapi schema
that were previously optional but, in practice, they should always
have been there.

* Fix gosimple lint.

* Define LiveMigration resource

The LiveMigration resource enables seamless live migration for KubeVirt and OpenStack, by
associating the source and target pods or VMs.  In KubeVirt it's backed by the KubeVirt
VirtualMachineInstanceMigration resource.  In OpenStack our Neutron driver creates it and writes it
to the etcd datastore.

Key components:
- LiveMigration type definition (libapiv3) with spec fields for source and
  destination workload endpoints.
- K8s backend client using the dynamic client to read KubeVirt VirtualMachineInstanceMigration
  resources (read-only: List, Get, Watch).  Populate LiveMigrationSpec from VMIM fields (vmiName, uid,
  sourcePod), filter VMIMs by phase (TargetReady/Running/Failed) in Get/List/Watch, replace the simple
  watch adapter with a state-tracking vmimWatchAdapter that synthesises Added/Deleted events on phase
  transitions and suppresses redundant updates.
- Watch adapter (unstructuredWatchAdapter) to bridge the dynamic client's
  unstructured objects to the Resource interface expected by the existing
  k8sWatcherConverter.
- clientv3 LiveMigration client with standard List/Get/Watch operations.
- Model, namespace, and scheme registrations.
- Felix syncer for LiveMigration.

The K8s backend is deliberately read-only since LiveMigration state is
owned by KubeVirt. The etcdv3 backend stores LiveMigration directly as a
custom resource (unchanged default behavior).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* embed filter for disabled ippools (#11851)

* Update libcalico-go/lib/apis/v3/livemigration.go

Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>

* Update libcalico-go/lib/apis/v3/livemigration.go

Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>

* Get node name from BGPConfig (#11850)

* Remove last vestiges of tier prefixing (#11867)

* Remove last vestiges of tier prefixing

* Fix generation

* Use common TierOrDefault impl

* Add CLAUDE.md with BPF test commands and code structure guide

Documents how to run BPF unit and functional tests, BPF C source
layout in bpf-gpl/, Go user-space packages in bpf/, and dataplane
management in dataplane/linux/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix FV_BATCH and FV_NUM_BATCHES validation in Felix FV configureManualSharding

* Centralize BPF sub-program management to single source of truth (#11822)

* Initial plan

* Remove hardcoded program counts from attach_test.go

Replace hardcoded BPF program counts with dynamic assertions that verify:
- Programs are loaded (count > 0)
- Program counts increase when adding endpoints
- Specific AttachTypes exist in the Programs map

This eliminates the need to update test expectations every time a new
BPF program is added to the system.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Rename variable for clarity: programsCountBeforeReset -> programsCountBeforeRestart

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Validate against expected program counts using Programs() map

Instead of checking if count > 0, now validate that:
- Count() matches len(Programs()) to ensure consistency
- Programs() map contains the expected AttachTypes (existing checks)
- For delta checks, verify count increased from previous value

This ensures we're validating the actual expected number of programs
loaded, not just that some programs were loaded.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Compute expected program counts based on hook package logic

Added helper functions that calculate the expected number of sub-programs
for each AttachType by mirroring the logic from hook.ProgramsMap.allocateLayout():

- countSubPrograms(): Counts sub-programs for an AttachType based on which
  programs should be loaded (considering hasHostConflictProg, hasIPDefrag, hasMaglev)
- expectedProgramCount(): Sums counts across all loaded AttachTypes
- Helper functions: hasHostConflictProg(), hasIPDefrag(), hasMaglev()

Tests now validate against computed expected counts instead of hardcoded values,
eliminating the need to update tests when new BPF programs are added.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Export helper methods from hook package to eliminate duplication

- Export HasHostConflictProg(), HasIPDefrag(), HasMaglev() in hook/load.go
- Update all uses in hook/map.go to use exported methods
- Remove duplicate helper functions from attach_test.go
- Update attach_test.go to use exported hook.AttachType methods

This eliminates code duplication and ensures tests use the same logic
as the production code when determining which programs should be loaded.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Add GetSubProgNames API to eliminate hardcoded program names in tests

- Added GetSubProgNames(hookType) function in hook/map.go
- Updated countSubPrograms() to use GetSubProgNames() instead of hardcoded arrays
- Removed 27 lines of duplicate program name definitions from test file

Now when a new BPF program is added, it only needs to be defined in one place
(tcSubProgNames or xdpSubProgNames in hook/map.go).

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Return copy of slice in GetSubProgNames to prevent mutations

Updated GetSubProgNames to return a copy of the internal arrays instead
of the original slices, preventing callers from accidentally modifying
the internal program name arrays.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Add comment clarifying GetSubProgNames behavior for TC hooks

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Extract sub-program filtering logic into GetApplicableSubProgs API

- Created GetApplicableSubProgs(at, skipIPDefrag) in hook/map.go
- Added SubProgInfo struct to hold sub-program metadata
- Updated allocateLayout to use GetApplicableSubProgs
- Updated countSubPrograms in test to use GetApplicableSubProgs
- Removed 28 lines of duplicate filtering logic from test

The filtering logic for determining which sub-programs should be loaded
is now centralized in a single API that both production code and tests use.
This eliminates duplication and ensures consistency.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Add clarifying comment about debug offset for TC hooks

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Fix BPF UT

* Revert helper methods to private (unexported)

Changed HasHostConflictProg → hasHostConflictProg
Changed HasIPDefrag → hasIPDefrag
Changed HasMaglev → hasMaglev

These methods are only used internally within the hook package (in
GetApplicableSubProgs and loadObj). Since GetApplicableSubProgs now
provides the public API that encapsulates the filtering logic, these
helper methods no longer need to be exported.

This provides better encapsulation and makes it clear that
GetApplicableSubProgs is the intended public interface for determining
which sub-programs should be loaded.

Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>

* Add traffic distribution support and enable topology-aware routing for Services

* Fix golangci-lint QF1001

* Add unit tests to cover edge cases in the topology_test.go file and integration-style tests in the syncer.go file.

* Address PR comments.

* ClusterNetworkPolicy: support generic protocols (#11804)

* E2E: Splits maglev test into two tests: IPv4 & IPv6 (#11801)

* splits maglev test into v4 and v6 runs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix CNI delete timer to start after acquiring IPAM lock (#11824)

Start the 90-second timeout after acquiring the lock in cmdDel, matching the pattern used in ADD operations. Previously, the timer started before lock acquisition, causing "context deadline exceeded" errors when DELETE operations waited in queue for the lock.

* Replace ippool filters in BIRD template with golang funcs (#11759)

* CNP: pick conformance improvements and enable it (#11833)

* Fix ipamconfigs -> ipamconfigurations (#11839)

* Rename Undefined encap mode to Never to align with v3 (#11831)

* Add a couple of aws-talos runs

* Hack CI

* Unhack CI

* Collect Multus network-attachment-definitions in cluster diags (#11816)

* Initial plan

* Add Multus network-attachment-definitions collection to calicoctl cluster diags

Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

* Generate CRD API manfiests (#11836)

* Migrate to Ginkgo v2

* Read coverprofile.out file

* Fix more ginkgo v2 errors and warnings

* Implement manual sharding for felix FVs

* Cleanup felix FV report to filter skipped tests

* Add preflight checks to allow ginkgo v2 only

* Pin calico/go-build with ginkgo v2 only installed

* Felix UT: add tier to calc graph benchmarks (#11855)

Test data was missing tier, resulting in warnings when it was inserted.

* Fix BPF UTs to run on latest kernel (#11837)

* Fix BPF Tests in 25.10 kernel.

* Revert changes to fv

* Fix bpf cleanup FV

* Add debug

* Fix difference between tc and tcx

* [ebpf] - Send tcp rst when a backend is deleted (#11762)

* Add tc programs to send tcp rst for both ipv4 and ipv6.

* Mark ct entries with the removed workload ip to send rst

* Send a rst if there is a CT hit and send_rst flag set.

* Stale NAT entries can occur for 2 reasons. Either the service is deleted or the backend is deleted.
When the service is deleted, we mark the NAT FWD Tcp service entry (when CTLB is disabled) to send a RST.
When the backend is deleted, no change is done in the NAT ct entries. When a next packet hits the CT entry
to a pod, it returns a RST, as a result of which all the entries are flagged as RSTSeen. The connection
dies and the CT entries are deleted after 2 mins.

* Fix build error

* Disable stale NAT conntrack scanner for TCP

* Address copilot review comments

* Address first set of review comments

* Address review comments batch 2

* Address review comments

* Check if the connection is actually reset

* Split TCP spoof test to avoid RST scanner race condition

The "should not be able to spoof TCP" test had two phases in a single
It block. Phase 1 called RemoveFromInfra which sends the workload IP
to the WorkloadRemoveScanner. If the conntrack scanner ran after the
persistent connection was established in Phase 2, it would mark the
CT entries with FlagSendRST, causing the connection to be killed and
the test to fail at expectPongs.

Split into two separate It blocks so Phase 2 starts clean without any
prior WEP removals poisoning the scanner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Address review comment

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
Co-authored-by: sridhar <sridhar@tigera.io>
Co-authored-by: Lucas Sampaio <lucas@tigera.io>
Co-authored-by: Mazdak Nasab <mazdak.nasab@gmail.com>
Co-authored-by: Alex O Regan <alex.oregan@tigera.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: sudheernv <nvsudheerjain@gmail.com>
Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>
Co-authored-by: Lancelot Robson <lancelot.robson@gmail.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
Co-authored-by: Jiawei Huang <jiawei@tigera.io>
Co-authored-by: Shaun Crampton <shaun@tigera.io>
Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Collect more diags when the kind k8st cluster setup fails (#11874)

* Setting TLS 1.3 only ciphers causes API server to fail on startup #11706 (#11812)

* Add configurable TLS minimum version to resolve TLS 1.3 cipher startup failure

This change introduces a TLS_MIN_VERSION environment variable that allows
users to configure the minimum TLS version for the API server and other
components. This resolves the issue where setting TLS 1.3-only ciphers
would cause startup failures due to Go's HTTP/2 cipher validation.

Changes:
- Add ParseTLSVersion() function to crypto/pkg/tls package
- Update NewTLSConfig() to use TLS_MIN_VERSION environment variable
- Update API server to parse and apply TLS_MIN_VERSION
- Add comprehensive tests for TLS version parsing
- Add TLS_CONFIGURATION.md documentation

Supported values for TLS_MIN_VERSION:
- "" or "1.2" (default): TLS 1.2 minimum
- "1.3": TLS 1.3 minimum (allows TLS 1.3-only cipher configurations)

Fixes #11706

* Improve TLS configuration docs with visual diagrams and comparison tables

- Add ASCII diagram showing TLS version and cipher relationships
- Add configuration validation matrix with visual indicators
- Add cipher compatibility diagram for quick reference
- Consolidate sections for better readability
- Reduce from 237 to 159 lines while keeping all essential info
- Add quick reference table at the top
- Improve troubleshooting section with clear comparison table

* Address PR review feedback

- Refactor NewTLSConfig to use local variable for minVersion
- Remove duplicate test from apiserver (already tested in crypto package)
- Remove TLS_CONFIGURATION.md documentation file

Changes per reviewer feedback from caseydavenport

* Fix struct field alignment in tls_test.go

Align struct fields consistently to pass CI formatting checks

* Update felix/CLAUDE.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update felix/CLAUDE.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update felix/CLAUDE.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Improve performance of IP autodetection when there are many IP addrs (#11834)

* Migrate Ginkgo v1 Measure to v2 gmeasure

Reference https://onsi.github.io/ginkgo/MIGRATING_TO_V2#migration-strategy-1.

* Fix go-vet issues

* make fix-all

* make -C libcalico-go gen-files

* Add RBAC for live migration to the tigera-operator chart

* Fix calicoctl UT not to expect the LiveMigration resource

* Switch LiveMigration K8s backend to KubeVirt typed client

Replace the dynamic.Interface / unstructured.Unstructured approach with
the official KubeVirt Go client (kubevirt.io/client-go) to follow the
codebase convention of using structured/typed clients.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* D'oh, make generate

* Replace subTest loops with individual test methods in calicoctl STs

Eliminates the hack of manually calling self.setUp() between subTest
iterations by splitting each parameterized loop into individual test
methods that call _test_* helpers. The test framework now handles
setUp/tearDown naturally for each case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Rename lib/v3 -> lib/internalapi (#11870)

* Automatic Pin Updates

* Add service index for EndpointSlice lookups in confd route generator

* Decouple resources package from kubevirt.io/client-go to fix e2e flag conflict

The kubevirt.io/client-go/log package registers a -v flag in its init()
that conflicts with klog's -v in binaries that transitively import the
resources package (e.g. the e2e test binary via bgp tests).

Introduce a VMIMClient interface in livemigration.go so the resources
package only imports kubevirt.io/api (types), not kubevirt.io/client-go.
The concrete client-go dependency stays in client.go (k8s package),
which the e2e binary does not reach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* make generate

* Update LiveMigrationSpec structure per review feedback

Adapt the VMIM-to-LiveMigration conversion and tests to the restructured
LiveMigrationSpec which uses Source/Destination WorkloadEndpointIdentifier
pointers instead of flat fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Start adding Claude instructions. (#11893)

* Add CLAUDE.md

* Add Kubernetes API design skill.

* Add selectors note.

* Address PR review comments: fix typos and update Ubuntu version

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Safely remove finalizers (#11882)

* Update pool sorting logic in pool controller (#11886)

* Update pool sorting logic in pool controller

* Clean up test code

* Add mutating admission policy for tier label, remove from OCP for now (#11890)

* Simplify LiveMigrationSpec.Source to what we really need

There is no anticipated case where we need this to be a selector.

* Include ippools filters in BirdBGPConfig (#11875)

* Fix owner reference test flake caused by Kubernetes GC (#11877)

The "should properly read / write owner references" test was creating a
NetworkPolicy with ownerReferences pointing to a Pod and NetworkSet that
didn't exist in the cluster. With v3 CRDs, owner references are stored
as real Kubernetes ownerReferences (not in an annotation like with v1
CRDs), so the garbage collector would detect the non-existent owners and
delete the NetworkPolicy before the test could verify it.

Fix by creating the actual owner objects (Pod and NetworkSet) before
creating the NetworkPolicy, using their real API server-assigned UIDs in
the owner references. This prevents the GC from deleting the dependent.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add webhooks version to config for node tests (#11900)

* Remove FV tests for crypto package tied to FIPS (#11507)

* Expand CLAUDE.md with architecture, dataplane, networking, and test sections

Add top-level sections covering Felix's major subsystems: architecture
overview, calc graph engine, dataplane manager pattern, iptables/nftables
dataplane, Windows dataplane, networking/routing, and configuration.
Extract a general "Running Tests" section (ut, fv, fv-nft) and keep
BPF-specific test details alongside the BPF dataplane section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix FV container startup race with apiserver.crt bind mount (#11891)

Remove redundant --mount for /tmp/apiserver.crt in FV test containers.
The cert is already accessible via -v /tmp:/tmp, and TLS verification
is skipped. The bind mount caused a race when multiple containers
started concurrently: runc created mount points on the shared host
/tmp, and the second container failed with "file exists".

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* make generate

* Add implement-calico-api-resource Claude Code skill (#11897)

* Add implement-calico-api-resource skill.

Step-by-step guide for plumbing a new API resource through all layers
of the Calico codebase: API types, code generation, backend model,
K8s client, clientv3, apiserver registry/storage, syncers, RBAC,
and manifests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update implement-calico-api-resource skill: split codegen steps, add calicoctl

- Step 3 now runs `cd api && make gen-files` for quick API codegen
  so downstream layers can compile during development
- Added Step 18 for calicoctl resource manager registration
- Moved full `make generate` to Step 20 (final step before commit)
- Renumbered RBAC to Step 19
- Updated checklist to include calicoctl

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address PR review comments

- Remove knownV3Kinds sub-step from Step 8 (list doesn't exist)
- Remove +kubebuilder:resource annotations from List type example
- Emphasize kubebuilder annotations for validation in Step 10,
  noting that Go struct validators are not executed for v3 CRDs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Refactor IPAM AllocationAttribute for VM-based handle support (#11894)

* Refactor IPAM AllocationAttribute for VM-based handle support

* Backport Felix calc graph computed data enhancement from Enterprise

@radixo developed this as part of his Istio work, and that is all intended for OSS as well, but we
especially need the calc graph enhancement as soon as possible for live migration.

With Claude's help I've then added unit tests for the new computed data feature:

- ActiveRulesCalculator: tests for AddExtraComputedSelector/
  RemoveExtraComputedSelector dispatch to OnComputedSelectorMatch/
  OnComputedSelectorMatchStopped callbacks, verifying match, no-match,
  removal, and isolation from policy callbacks.

- PolicyResolver: tests for OnEndpointComputedDataUpdate covering
  inclusion in flush, nil removal, nil-to-nil no-op, multiple kinds,
  and cleanup on endpoint deletion.

- EventSequencer: test for ModelWorkloadEndpointToProto verifying
  EndpointComputedData.ApplyTo modifies the proto output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Review markup

* Fix Docker-based make targets in git worktrees (#11898)

* Fix Docker-based make targets when running from a git worktree

In a git worktree, .git is a file pointing to the main repo's
.git/worktrees/<name> directory. When the worktree is mounted into
a Docker container, git commands inside the container fail with
"fatal: not a git repository" because the main .git directory
isn't available.

Detect worktrees by comparing git-dir to git-common-dir, and when
running in a worktree, mount the main .git directory and set
GIT_DIR / GIT_WORK_TREE so git works inside containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address review feedback: use absolute git paths and configurable work tree

- Use --absolute-git-dir and $(realpath ...) to guarantee absolute paths
  for Docker volume mounts (avoids potential issues with relative paths).
- Make the container work tree path overridable via DOCKER_GIT_WORK_TREE
  so Makefiles that mount at a different path (e.g. api/Makefile) can
  set it appropriately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Operator CRD update (make generate)

* Enhance Felix route table for elevated priority programming

JIRA ticket: CORE-12272

On its own, this PR does not yet change how Felix programs local routes.  `SetRoutes` callers do not
currently set the new `Priority` field, so programmed `Priority` will be 0 for IPv4 routes and 1024
for IPv6 routes, as was the case before this PR.

Upcoming, but separate, live migration work will:
- add Felix configuration fields for "normal" and "elevated" priority values
- change route programming outside of live migration to use the "normal" priority value
- program routes with the "elevated" priority value during a live migration, as part of ensuring the best possible handover

* Add status subresource to KubeControllersConfiguration CRD (#11889)

* Improve tigera-operator helm chart values.yaml and README (#11907)

Co-authored-by: Ludwig <tommludwig@icloud.com>
Co-authored-by: Tom Ludwig <83090745+tom-ludwig@users.noreply.github.com>

* Trim PR template and simplify cherry-pick headings (#11905)

* Trim PR template; update cherry-pick script headings

Simplify the PR template to just a description comment and the
release-note block. Remove the verbose sections (Description,
Related issues/PRs, Todos, Reminder for the reviewer) that were
rarely filled in.

Update cherry-pick-pull to use bold text instead of markdown headings
for cherry-pick history and multi-PR separators. Also improve re-pick
handling to absorb old bullet points (both old ## and new ** formats)
into the header so picks stack correctly with newest at top.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add bold Release note label above the code block

Without a heading, the release-note block looked like an anonymous
code block in the PR UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Factor out build-pr-description() for testability

Extract the PR title/body/label generation from make-a-pr() into a
pure text-processing function build-pr-description() that can be
tested with canned input (no gh/git calls).  Add
cherry-pick-pull_test.sh with 9 test cases covering fresh picks,
re-picks (old/new header formats, stacked bullets), multi-PR picks,
cross-repo ref prefixing, label filtering, section stripping, and
META_BLOCK inclusion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Handle indented sub-bullets in cherry-pick history parsing

The old awk patterns only captured top-level "- Pick onto ..." bullets.
When re-picking a multi-PR cherry-pick (which uses indented sub-bullets
like "  - org/repo#123"), those indented lines were left in the body
instead of being absorbed into the history header.

Rewrite both awk scripts to track an in_hist state and capture/skip all
bullet lines (top-level or indented) within the history block. Add a
test case for re-picking a multi-PR pick with indented sub-bullets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Simplify cherry-pick history handling; use exact test assertions

Replace the awk-based history parsing with a simple first-line check:
if the body starts with a cherry-pick history header, strip it (old
bullets stay in place beneath ours); otherwise add a blank line
separator.

Rename output variables from BUILD_ to NEW_PR_ for clarity.  Rewrite
all test assertions to use assert_equals on the full output strings
(with readable multi-line expected values) instead of
assert_contains, catching any whitespace formatting issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add shell-test target to hack/Makefile for CI

Add a shell-test make target that runs cherry-pick-pull_test.sh, and
include it in the ci target so it runs in the "Tools (hack directory)"
Semaphore CI block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Clarify PULL_LABELS and PULLLINK doc comments

Address review nits: clarify that PULL_LABELS elements contain
newline-separated labels (one element per PR), and rename "markdown
links" to "PR links" for PULLLINK.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Revert k8s.io/client-go version change in go.mod

It appears this wasn't actually necessary - perhaps a mistake introduced by Claude using
non-containerized go calls.

* Fix deps generation to take account of replacements

Addresses the wrong-looking deps file changes like
[here](https://github.com/projectcalico/calico/pull/11868/changes#diff-e80a705fd1859f7941b49c10e40de0b6c44d20f50776a20622dc7593c791d766R46)
and @fasaxc's comment about that
[here](https://github.com/projectcalico/calico/pull/11868#discussion_r2841071924)

* make generate

* Review markups

* Add isolated customer environments calc graph benchmark (#11866)

* Add isolated customer environments calc graph benchmark

Simulates a multi-tenant SaaS cluster with 1k/10k/100k identical
namespaces, each containing frontend, backend, and database pods
with per-deployment namespaced Calico NetworkPolicies, plus a
system namespace with monitoring pods that can reach all tenants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Scatter local pods across namespaces in isolated customer benchmark

Use 100 pods/node as baseline across all variants. Each pod is
assigned to exactly one host with no overlap. Local pods are
scattered uniformly across namespaces via a stride, simulating
realistic K8s scheduling rather than clumping all local pods into
a few namespaces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update isolated customer benchmark with realistic SaaS label cardinalities

Replace frontend/backend/database pod templates with uniform pods per
namespace and a configurable podsPerNamespace parameter. Model pod labels
(15 per pod) and namespace labels (3 per namespace) on a real SaaS
deployment with realistic value cardinalities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use 1 pod per namespace in isolated customer benchmark

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* More `make generate` changes that I somehow missed just now

* Don't store label restrictions for every selector. (#11846)

* Don't store label restrictions for every selector.

Instead, cache only the most recently used label restrictions
in a package-level variable.  The label restrictions are used
several times when adding a selector to the index (which happens
on a single thread) so caching one instance avoids almost all
recalculation.

Storing a map per selector really adds up if there are tens of
thousands of selectors active. In addition, some uses of selectors
don't need the restrictions, so we save calculation there.

* Wrap LabelRestrictions in a struct to protect the cached map.

Change LabelRestrictions from a map type alias to a struct with a
private map field, exposing All() and Get() methods. This prevents
callers from accidentally modifying the cached label restrictions map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add tests for LabelRestrictions cache hit/miss behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add some new e2e tests (#11892)

* fix: add nil check in AddressesAsCIDRs to prevent SIGSEGV (#11602)

When ExternalIP is empty in Kubernetes, a nil ip.Addr can end up in
the Addresses slice of l3rrNodeInfo. The existing cleanup logic only
removed emptyV4Addr and emptyV6Addr (zero-valued structs), but not nil
interface values. This caused a panic (SIGSEGV) when AsCIDR() was
called on a nil address at l3_route_resolver.go:166.

This fix adds a nil check to the cleanup loop to filter out nil
addresses before iterating and calling AsCIDR().

Fixes #11384

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(windows): also print 'bin_dirs' value on uninstall-calico-hpc.ps1

Print new 'bin_dirs' key used in containerd v2.1+

* [UI-256] update calico icons (#11685)

* update calico icons

* fix linting

* prevent nil pointer dereference in handleBlockUpdate (#11913)

* prevent nil pointer dereference in handleBlockUpdate for blocks with nil affinity

* Add apt publishing framework to release tool

* Add Suite field to repository Releases file

* Run gofmt

* Run gofumpt

* Handle sourcesFile and oddly named packages better

* Fix error strings; handle empty outputDir parameter

* Fix-all

* assert correct error value in generating filteres for ippools (#11914)

* fix: advertise /32 LB IPs assigned from IPPool via BGP (CI-1944) (#11917)

* Update to new operator CRDs location (#11918)

* Update CLAUDE.md with BPF build and test instructions

Add guidance on make build-bpf, FOCUS-filtered BPF unit tests,
and TestPrecompiledBinariesAreLoadable for kernel verifier checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix/simplify k8s backend and add VMIM progression tests

The use of vmimWatchAdapter was wrong because watches can break and need to be restarted, which
means that Watch() is called from scratch again, and then we wouldn't have any state describing
whether we've previously emitted a LiveMigration.

Instead, for watching, we can just:
1. always pass through the updateType that we see for the VMIM event
2. when the VMIM is in a state that should not appear as a LiveMigration, represent that by
   emitting a nil Value.

When Typha is the immediate downstream - as it usually is - it understands those, deduplicates no-op
"updates", converts nil values to deletions, and recalculates the updateType accordingly.  When
Felix is the immediate downstream, Felix's `dedupebuffer` does the same thing.

Update existing UTs accordingly - e.g. expect Modified with nil value instead of Deleted - and add
new UTs for the typical VMIM progressions that we expect to see.

* Regenerate e2e/deps.txt

* Avoid our client package pulling in kubevirt.io/client-go

This is like the problem already described in a53eb0b5c3, namely:

    Decouple resources package from kubevirt.io/client-go to fix e2e flag conflict

    The kubevirt.io/client-go/log package registers a -v flag in its init()
    that conflicts with klog's -v in binaries that transitively import the
    resources package (e.g. the e2e test binary via bgp tests).

    Introduce a VMIMClient interface in livemigration.go so the resources
    package only imports kubevirt.io/api (types), not kubevirt.io/client-go.
    The concrete client-go dependency stays in client.go (k8s package),
    which the e2e binary does not reach.

But worse, because now, since cf44126, our e2e test code directly pulls in
"github.com/projectcalico/calico/libcalico-go/lib/clientv3" (in e2e/pkg/tests/ipam/ipam_gc.go).

I think the only sustainable solution (assuming we can't rely on kubevirt fixing their client-go
code) is to make our k8s backend client not pull in kubevirt by default, and instead add a dedicated
"github.com/projectcalico/calico/libcalico-go/lib/backend/k8s/kubevirt" package to enable that at
runtime.  Interested main programs such as Felix will need to call `kubevirt.Enable(...)` to add
VirtualMachineInstanceMigration / LiveMigration handling to the k8s backend client.

* Revert "Avoid our client package pulling in kubevirt.io/client-go"

This reverts commit fa7c4a3520857b872799673bf044ff2bc9731ff0.

* Use Tigera fork of kubevirt/client-go

* Regenerate deps files

* Add .claude to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix persistent connection teardown race in BPF spoof test

When Stop() is called on a PersistentConnection, it creates a loop file
to signal test-connection to exit, then waits for the process. However,
the tryLoopFile loop checks the loop file only in ls.Next(), which comes
AFTER the Receive() call. If the TCP connection gets reset (e.g. due to
Felix reprogramming BPF state after a WEP reconfiguration), Receive()
returns a non-timeout error and hits log.Fatal before ever checking the
loop file. This causes test-connection to exit with code 1, which makes
Stop() fail the test even though all assertions passed.

Fix: in the Receive() error handler, check if the loop file exists
(meaning we were asked to stop). If so, exit cleanly instead of fatally.

* Update felix/fv/test-connection/test-connection.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Guard loop-file shutdown check on sentInitial

Only treat a receive error as a clean shutdown when the initial
exchange has already completed. Before this fix, an early connection
failure could be misinterpreted as a requested stop since the loop
file is expected to exist at startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update DNS nearer the start, and add retries to apt-get installs

* Add direct-mapped cache to uniquelabels.Make for repeated inputs (#11854)

* Add direct-mapped cache to uniquelabels.Make for repeated inputs

It is common for Make to be called in quick succession with the same
input map from different call sites. This adds a small (128-entry)
direct-mapped cache that avoids redundant handleMap allocations when
the input matches a recently-computed Map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Improve cache test coverage and fix EquivalentTo nil handling

EquivalentTo now distinguishes Nil from Empty, matching the Map type's
design. Add tests for cache misses, eviction, hash collisions,
concurrent access, and EquivalentTo edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Rename makeCache to recentMapCache for clarity

The old name read like a function call rather than a type name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Send HostMetadata to BPF KubeProxy (#11817)

Sends `*proto.HostMetadataV4V6Update` and `*proto.HostMetadataV4V6Remove` to the BPF KP, to facilitate host-aware functionality.

* Index routetable structs on kernelRouteKey instead of just CIDR

Relatedly:
- Change `RouteRemove` to take a `Target` (like `RouteUpdate` and `SetRoutes`) instead of just
CIDR.  (Only the key fields are significant in `RouteRemove` calls.)

Todo:
- Test adding/removing routes with same CIDR and different priorities at the same time.
- Do we need to handle different priorities in `conntrackTracker` calls?

* Firewall attempt to Watch LiveMigrations on v3 API

* Add comment for why v3 API Watch is disallowed

* Update v3 client e2e tests for LiveMigration

* Add webhooks version command and hook into hashreleases (#11941)

* [CORE-12132] Move windows e2es to azr-aso provisioner

change semaphore machine to f1

add dummy needrestart script when needed

add installer: operator configuration

add new RUN_LOCAL_TESTS option to e2e tests and use them on the windows pipeline

use azr-aso provisioner for iptables run

add env var to fix porter pods

fix conncheck TCP command for Windows

increase timeouts on tests for Windows

clean up timeouts and comment about long one for windows image pulling

* [CORE-12378] fix(QoS): Use QdiscReplace() instead of QdiscAdd()

Use QdiscReplace() instead of QdiscAdd() so that adding the TBF
qdiscs needed for QoS controls with tc does not error out when
there is an existing non-default (handle != 0) qdisc on the
interface for any reason.

Add a test case to the felix FVs to cover this.

Also, enable felix debug logging on the QoS felix FVs, and
remove overzealous Skip() that was resulting in no test cases
running on iptables/nftables modes.

* Fix BPF FV flake: sync NAT maps on all felixes before connecting (#11939)

The "backend replaced" test checked only felix-1's NAT maps before
starting a persistent connection, but felix-0's CTLB cgroup program
(shared across all containers in the FV environment) reads from
felix-0's maps. If felix-0 hadn't synced yet, the CTLB program
returned NAT_NO_BACKEND → EPERM, causing test-connection to exit
before deleting the loop file.

Check all felixes' NAT maps so both the CTLB and TC paths are ready.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Bump base image to UBI 9 in calico-node component (#11860)

* Bump base image to UBI 9 in calico-node component

* Refactor iptables-legacy installation

* Fixing symbolic links and xtables-legacy binary

* - Install ipset and nftables from ubi9 repo
- Merge multiple microdnf installs

* Remove outdated comments and obsolete grant permissions; improve code organization.

* Fix iptables-legacy libs

* Move templates to templates dir

* Use RouteKey in RouteTable API

- Rename kernelRouteKey to RouteKey and make it part of the API.
- In Target use an embedded RouteKey instead of equivalent individual fields.
- Update RouteRemove to take RouteKey instead of Target.

* CI: Tell Semaphore to upload logs as artifacts when truncating (#11946)

* Felix FV CI: mark failed per-VM logs and upload them to transient GCS (#11940)

* Initial plan

* feat(ci): rename failed felix FV VM logs and upload to GCS

Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

* fix(ci): harden failed felix FV log summary and GCS upload guard

Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

* fix(ci): scope failed FV log uploads by job tag

Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

* fix(ci): make monitor tail retry-safe and summarize actual failed log path

Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>

* Fix IPv6 RouteRemove not normalizing Priority 0 to 1024

RouteUpdate and SetRoutes normalize IPv6 Priority 0 to 1024 (since the
kernel treats 0 as "use default" which is 1024 for IPv6), but RouteRemove
did not. This meant RouteRemove with Priority 0 would fail to find and
remove IPv6 routes that were stored with the normalized key Priority 1024.

Extract normalizeRouteKey from routeKeyForTarget and call it in
RouteRemove as well.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add unit tests for routes with different priorities sharing the same CIDR

Include Priority in mock netlink KeyForRoute so routes with the same CIDR
but different priorities get distinct keys in the mock dataplane. Update
all existing route key assertions accordingly.

Add 11 new tests covering multi-priority route scenarios: adding and
removing routes at different priorities, resync behavior, and stale route
cleanup. Add a test for IPv6 RouteRemove Priority normalization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Throw away result of .Close()

* Fix-all

* Add `ssh-key add` to cni-plugin push-images pipeline

* Add `ssh-key add` to all push-images pipelines

* Remove doubles

* Use net.JoinHostPort() for IPv6-safe host:port formatting

Go 1.26 tightened net/url parsing to reject bare IPv6 addresses in
URLs (issue #75223). URLs like http://2001:db8::1:9099/path were
silently accepted in Go 1.25 but now correctly return a parse error.
This broke all IPv6 BPF FV tests which construct health check URLs
with unbracketed IPv6 addresses.

Replace string concatenation with net.JoinHostPort() which correctly
brackets IPv6 addresses ([::1]:9099) and leaves IPv4 unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix IPv6 DNAT/SNAT rule formatting in iptables and nftables backends

Both ip6tables and nftables require bracketed IPv6 addresses when a
port is present (e.g., [2001:db8::1]:80). The DNAT and SNAT actions
were using plain string formatting which produces invalid rules for
IPv6. Use net.JoinHostPort() which correctly brackets IPv6 addresses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add live migration routing sequence unit tests

Test the two key live migration subcases where Felix-managed routes
coexist with external BIRD routes at different priorities for the same
VM IP:

(a) Source host: Felix local route at normal priority, BIRD remote route
    appears at elevated priority, Felix removes its route, BIRD reverts
    to normal priority.

(b) Destination host: BIRD remote route at normal priority, Felix
    programs local route at elevated priority, BIRD route removed, Felix
    reverts to normal priority.

Verifies that resync never disturbs the external BIRD routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Document conntrack tracker limitation with multiple route priorities

Add a detailed comment explaining the interaction between conntrack
cleanup and multiple route priorities.  The ConntrackCleanupManager is
keyed on CIDR (one owner per CIDR) while routes are keyed on RouteKey
(CIDR + Priority).  For the live migration use case this is safe: Felix
only manages one route per CIDR at a time, with the coexisting BIRD
route being external to the tracker.

On the source host, the conntrack flush when Felix removes its local
route is correct: the VM is leaving, so stale conntrack entries should
be flushed to force policy re-evaluation (the return path may now
traverse different HostEndpoints on a different host).

On the destination host, no flush is triggered because BIRD's
pre-existing route was never tracked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add VirtualMachineInstanceMigration RBAC to non-operator manifests

Commit c9b6f2be14 added RBAC for watching kubevirt.io
VirtualMachineInstanceMigration resources to the operator manifests,
but missed the non-operator (Helm chart) manifests. Add the same
get/list/watch permissions to the calico-node ClusterRole in the
calico chart template and regenerate manifests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Move VMIM RBAC rule outside network==calico gate

Felix's KDD syncer always syncs VirtualMachineInstanceMigration
resources regardless of network/IPAM mode, so the RBAC rule must
be gated only on datastore==kubernetes, not additionally on
network==calico. This adds the rule to canal and policy-only
manifests that were previously missing it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add a few more tiered RBAC e2e tests (#11895)

* Improve comment about live migration and conntrack state cleanup

* IPAM support for KubeVirt IP persistence (#11865)

* Add KubeVirt VM live migration IPAM support

Implement VM-aware IPAM allocation that preserves IP addresses across
KubeVirt virtual machine live migrations. Key changes:

- Add KubeVirt VMI client library for tracking VM identity and migrations
- Extend CNI IPAM plugin with VM-aware allocation and release logic
- Add IPAMConfig resource with MaxAllocPerIPVersion for VM IP limits
- Support handle-based IP reuse to maintain stable IPs during migration
- Add comprehensive unit and integration tests for kubevirt IPAM flows

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix comment

* Added filter for non VM pod

* Review Markups

* Update cni-plugin/pkg/ipamplugin/ipam_plugin.go

Co-authored-by: Shaun Crampton <shaun@tigera.io>

* Review Markups II

* Fix static checks

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shaun Crampton <shaun@tigera.io>

* fix(qos tests): remove unnecessary waits for routes on QoS felix FVs

Remove unnecessary `Eventually()` calls that were waiting for the
workload interface to be present in the output of `ip r` on every
workload update (mostly changing QoS configs).

* Added cali bot trigger (#11849)

* Added cali bot trigger

* code review fixes

* Rename cali-bot-trigger.yml to calico-github-issues-bot-trigger.yml

* Add Claude Code skill for reproducing CI failures on GCP VMs (#11968)

* Add Claude Code skill for reproducing CI failures on GCP VMs

Documents the process of creating a GCP VM that matches the CI
environment (image family, Docker version, sysctl settings) to
reproduce kernel-dependent test failures locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix GCP skill to match CI: machine type, disk size, Docker repo format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Pedro Coutinho <coutinhop@users.noreply.github.com>
Co-authored-by: Nell Jerram <nell@tigera.io>
Co-authored-by: Mazdak Nasab <mazdak.nasab@gmail.com>
Co-authored-by: marvin-tigera <marvin-tigera@users.noreply.github.com>
Co-authored-by: Casey Davenport <caseydavenport@users.noreply.github.com>
Co-authored-by: Lucas Sampaio <lucas@tigera.io>
Co-authored-by: Pedro Coutinho <pedro@tigera.io>
Co-authored-by: Alex O Regan <alex.oregan@tigera.io>
Co-authored-by: Shaun Crampton <shaun@tigera.io>
Co-authored-by: Casey Davenport <davenport.cas@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: nelljerram <2089263+nelljerram@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Alex Harford <alex.harford@tigera.io>
Co-authored-by: MichalFupso <michal@tigera.io>
Co-authored-by: Brian McMahon <brianmcmahon135@gmail.com>
Co-authored-by: Steve Gao <steve@tigera.io>
Co-authored-by: tuti <tuti@tigera.io>
Co-authored-by: Tomas Hruby <tomas@tigera.io>
Co-authored-by: sridhartigera <63839878+sridhartigera@users.noreply.github.com>
Co-authored-by: marvin-tigera <marvin@projectcalico.io>
Co-authored-by: Seth Malaki <seth@tigera.io>
Co-authored-by: Lancelot Robson <lancelot.robson@gmail.com>
Co-authored-by: Seth Malaki <seth@projectcalico.org>
Co-authored-by: Tomas Hruby <49207409+tomastigera@users.noreply.github.com>
Co-authored-by: sudheernv <nvsudheerjain@gmail.com>
Co-authored-by: Lance Robson <lance@tigera.io>
Co-authored-by: Jiawei Huang <jiawei@tigera.io>
Co-authored-by: fasaxc <469264+fasaxc@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: haojiwu <haojiwu@gmail.com>
Co-authored-by: sridhar <sridhar@tigera.io>
Co-authored-by: KameHameHa21110 <abhinavgovind23@gmail.com>
Co-authored-by: Brian Stack <brian@render.com>
Co-authored-by: Song Jiang <song@tigera.io>
Co-authored-by: Walter Neto <walter@tigera.io>
Co-authored-by: Ludwig <tommludwig@icloud.com>
Co-authored-by: Tom Ludwig <83090745+tom-ludwig@users.noreply.github.com>
Co-authored-by: lif <1835304752@qq.com>
Co-authored-by: Daniel Fox <dan.fox@tigera.io>
Co-authored-by: Oleksandr Skoryk <skorichok88@gmail.com>
@ronanc-tigera ronanc-tigera requested review from a team as code owners March 3, 2026 15:39
@github-actions github-actions bot removed the stale Issues without recent activity label Mar 3, 2026
ronanc-tigera and others added 7 commits March 18, 2026 15:13
* Whisker - policy filter version 3

* update filters to latest backend changes

* Whisker - update table styles

* Whisker - update table styles
@ronanc-tigera ronanc-tigera added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact and removed release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Mar 19, 2026
@ronanc-tigera ronanc-tigera merged commit 4f413dc into projectcalico:master Mar 19, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants