LPS Network Endpoint Enhancements and Signaling by milan-zededa · Pull Request #5904 · lf-edge/eve

milan-zededa · 2026-05-05T17:17:24Z

Description

Implements the EVE side of the API additions introduced in
lf-edge/eve-api#144.

Three API enhancements plus two ancillary improvements:

1. Signal endpoint — low-latency LPS config notifications (`GET /api/v1/signal`)

EVE now maintains a persistent long-lived HTTP GET connection to the optional
/api/v1/signal endpoint on the Local Profile Server. LPS can push a
Signal proto message (NDJSON-framed) listing which endpoints have a pending
configuration change. On receipt, EVE immediately polls the listed endpoints
instead of waiting for the next scheduled tick.

Implementation details:

New goroutine in pkg/pillar/localcommand/signal.go handles the stream
lifecycle independently of the watchdog.
TCP keepalive at 60 s is the sole liveness mechanism (no app-layer heartbeat),
as required by the spec.
Exponential backoff on failure (1 s initial, 2× doubling, 30 s cap).
If LPS returns 404 (endpoint not implemented), reconnect attempts are throttled
to one per hour so a non-signaling LPS does not generate excess traffic.
Rate limiter (one signal per 3 s, burst 3) guards against a buggy or malicious
LPS overwhelming EVE with spurious triggers. Periodic polling is the
correctness guarantee; dropped signals are safe.
The stream is restarted whenever the LPS address changes (UpdateLpsConfig).
Unknown ConfigEndpoint enum values are silently ignored for forward
compatibility.

2. Per-port runtime status (`NetworkInfo.port_status`)

EVE now populates the new NetworkPortStatus repeated field in every
POST /api/v1/network request. For each network port it reports the
kernel-observed runtime state: link up/down, MAC address, assigned IP
addresses (CIDR), active default gateways, DNS servers, DNS search domain,
NTP servers, and MTU.

3. `local_modifications_allowed` per port in `NetworkInfo.latest_config`

EVE sets NetworkPortConfig.local_modifications_allowed when reporting
latest_config in NetworkInfo, mirroring the controller-provisioned
SystemAdapter.allow_local_modifications flag. LPS can use this to know
upfront which ports accept locally submitted configuration, rather than
discovering it via a trial-and-error error message.

4. Reactive LPS network POST on config/status change

EVE now triggers an immediate POST /api/v1/network when either:

DevicePortConfigList changes (new network configuration applied), or
DeviceNetworkStatus changes with a meaningful state update (link state,
IP assignment, etc.).

Previously the network endpoint was only driven by the periodic ticker.

5. Configurable LPS polling intervals

All six LPS polling intervals are now tunable via controller config
properties (previously they were compile-time constants):

Property	Default	Min	Max
`timer.lps.profile.interval`	60 s	3 s	1 h
`timer.lps.radio.interval`	5 s	3 s	1 h
`timer.lps.appinfo.interval`	60 s	3 s	1 h
`timer.lps.devinfo.interval`	60 s	3 s	1 h
`timer.lps.network.interval`	60 s	3 s	1 h
`timer.lps.appbootinfo.interval`	60 s	3 s	1 h

Defaults match the previous hard-coded values, so behaviour is unchanged
without explicit configuration.

How to test and validate this PR

Prerequisites

You need a running EVE device with a deployed Local Profile Server application.
Consider using my LPS implementation.

1. Validate `local_modifications_allowed` in NetworkInfo

LPS side: In the handler for POST /api/v1/network, inspect each entry in
network_info.latest_config.ports. Ports provisioned by the controller with
allow_local_modifications = true must arrive with
local_modifications_allowed = true; those provisioned without it must arrive
with false.

Expected behaviour: LPS can display a read-only indicator next to ports
that it is not permitted to reconfigure, and can pre-validate a
LocalNetworkConfig submission before sending it.

2. Validate per-port runtime status (`port_status`)

LPS side: In the same POST /api/v1/network handler, inspect
network_info.port_status. For each network port you should see:

logical_label — matches the port label from latest_config
interface_name — kernel interface name (e.g., eth0)
link_up — true when the port has carrier
mac_address — colon-separated hex (e.g., aa:bb:cc:dd:ee:ff)
ip_addresses — CIDR notation (e.g., 192.168.1.10/24, fd00::1/64)
gateways — active default routers
dns_servers / dns_domain — effective resolver configuration
ntp_servers — NTP peers in use
mtu — effective MTU in bytes

To verify reactivity (enhancement 4): Change a network configuration on
the device (e.g., add a static route, trigger a DHCP renewal). LPS should
receive an updated POST /api/v1/network within a few seconds, not after the
next 60-second tick.

3. Validate the Signal endpoint — basic trigger

LPS side: Implement GET /api/v1/signal as a streaming endpoint that:

Keeps the HTTP connection open indefinitely (use chunked transfer encoding).
When a user submits a configuration change (e.g., via a UI that calls
PUT /api/v1/local_profile), immediately writes a single NDJSON line:
```
{"pendingChanges":["CONFIG_ENDPOINT_LOCAL_PROFILE"]}\n
```
and flushes the response buffer.
Does NOT write anything until there is actually a pending change.
Does NOT write heartbeats or empty lines.

EVE side: After receiving the signal, EVE must immediately fetch
GET /api/v1/local_profile and apply the new profile — without waiting for
the next 60-second tick.

Verification: Measure the latency between submitting the config change on
LPS and EVE applying it. With signaling it should be under 2 seconds; without
signaling it would be up to 60 seconds.

4. Validate the Signal endpoint — multi-endpoint coalescing

Send a signal listing multiple endpoints simultaneously:

{"pendingChanges":["CONFIG_ENDPOINT_NETWORK","CONFIG_ENDPOINT_APP_INFO"]}\n

EVE must trigger both the network POST and the app info POST immediately.

5. Validate configurable polling intervals

Using the controller, set:

timer.lps.profile.interval = 10

Observe in EVE logs that the local profile is now fetched every 10 seconds
instead of every 60 seconds. Restore the default when done.

Changelog notes

Low-latency LPS configuration updates. EVE can now receive near-instant
notifications from the Local Profile Server when new configuration is ready,
reducing the delay from up to 60 seconds to under 2 seconds. LPS
applications must implement the optional GET /api/v1/signal streaming
endpoint to take advantage of this; existing LPS deployments without the
endpoint continue to work exactly as before.
Richer network status in LPS. EVE now reports the live kernel-observed
state of every network port to LPS on each network POST: link up/down,
assigned IP addresses (with subnet mask), default gateways, DNS and NTP
servers, and MTU. LPS applications can display this information to operators
without needing a separate management channel.
Per-port modification permission flag. LPS now knows which network ports
it is allowed to reconfigure (as provisioned by the controller), so it can
give operators clear feedback instead of a generic error when they attempt to
modify a controller-managed port.
Reactive LPS network updates. EVE now pushes an updated network POST to
LPS immediately when the device's network configuration or link state changes,
rather than waiting for the next periodic tick.
Tunable LPS polling intervals. All six LPS polling intervals (local
profile, radio, app info, device info, network, app boot info) can now be
adjusted via controller config properties. The defaults are unchanged.

PR Backports

- 16.0-stable: Yes. Even though this is strictly speaking a new feature, we must make an exception here...
- 14.5-stable: No 
- 13.4-stable: No

Checklist

I've provided a proper description
I've added the proper documentation
I've tested my PR on amd64 device
I've tested my PR on arm64 device
I've written the test verification instructions
I've set the proper labels to this PR
I've checked the boxes above, or I've provided a good reason why I didn't check them.

codecov · 2026-05-05T19:10:58Z

Codecov Report

❌ Patch coverage is 5.05837% with 244 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.01%. Comparing base (63fa63d) to head (d22cfef).
⚠️ Report is 7 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/pillar/localcommand/signal.go	0.00%	146 Missing ⚠️
pkg/pillar/controllerconn/send.go	0.00%	38 Missing ⚠️
pkg/pillar/localcommand/network.go	0.00%	28 Missing ⚠️
pkg/pillar/localcommand/agent.go	0.00%	12 Missing ⚠️
pkg/pillar/cmd/zedagent/zedagent.go	0.00%	7 Missing ⚠️
pkg/pillar/localcommand/profile.go	0.00%	5 Missing ⚠️
pkg/pillar/localcommand/appbootinfo.go	0.00%	2 Missing ⚠️
pkg/pillar/localcommand/appinfo.go	0.00%	2 Missing ⚠️
pkg/pillar/localcommand/devinfo.go	0.00%	2 Missing ⚠️
pkg/pillar/localcommand/radio.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5904      +/-   ##
==========================================
+ Coverage   21.62%   22.01%   +0.38%     
==========================================
  Files         464      475      +11     
  Lines       83994    85923    +1929     
==========================================
+ Hits        18166    18912     +746     
- Misses      64300    65304    +1004     
- Partials     1528     1707     +179

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

eriknordmark

Run tests

But some yetus annotations to fix (however, yetus shows as passing which is new to me!)

rene · 2026-05-06T09:08:39Z

Run tests

But some yetus annotations to fix (however, yetus shows as passing which is new to me!)

If you look to the artifacts, Yetus is 100% passing because it checks only the patch (what it was changed), so if you go to the final result files, they are all blank (no errors).

However, the annotation step runs over the patch-*-result files, which shows all errors for the file and not only what was changed.... the annotation step will not fail (I did it on purpose). I think we can keep this approach while we solve the issues as they show up on files we are changing...

But it's important to point out that any error introduced by the PR will make Yetus to fail.... so the check is working.....

milan-zededa · 2026-05-06T13:03:42Z

Run tests
But some yetus annotations to fix (however, yetus shows as passing which is new to me!)

If you look to the artifacts, Yetus is 100% passing because it checks only the patch (what it was changed), so if you go to the final result files, they are all blank (no errors).

However, the annotation step runs over the patch-*-result files, which shows all errors for the file and not only what was changed.... the annotation step will not fail (I did it on purpose). I think we can keep this approach while we solve the issues as they show up on files we are changing...

But it's important to point out that any error introduced by the PR will make Yetus to fail.... so the check is working.....

I think this approach is quite annoying for the reviewer. Furthermore, these are not "issues", but suggestions. We may not agree with everything pointed out by Yetus, so we will intentionally ignore some, but they will keep popping up...

eriknordmark · 2026-05-06T13:34:14Z

However, the annotation step runs over the patch-*-result files, which shows all errors for the file and not only what was changed.... the annotation step will not fail (I did it on purpose). I think we can keep this approach while we solve the issues as they show up on files we are changing...

This is at best confusing (because those annotations show up in the diffs the same way as when they cause failures) and a bit annoying. Can we have the annotations be limited to the changed lines as before.

See: lf-edge/eve-api#144 Signed-off-by: Milan Lenco <milan@zededa.com>

Mirror the controller-provisioned SystemAdapter.allow_local_modifications flag through dpcToProto so LPS can distinguish ports that accept local configuration from those managed exclusively by the controller, without having to infer it by trial-and-error via error_message. Signed-off-by: Milan Lenco <milan@zededa.com>

Open a long-lived GET /api/v1/signal stream to the Local Profile Server and, upon each incoming Signal message, immediately trigger the listed endpoints' pollers -- bypassing the ~1-minute periodic cadence while preserving it as the correctness fallback. This removes the minute-scale delay that operators previously saw between entering a config change in the LPS UI and EVE picking it up. The Signal handler runs as an additional LocalCmdAgent goroutine. Connection open is guarded by the existing startTask/runInterruptible/ endTask pattern used by the other pollers; the long body read runs without the task lock so it cannot block pause(). On URL change, UpdateLpsConfig cancels the in-flight stream and wakes the goroutine, which reconnects against the current LPS address. Dispatches are rate-limited (1 signal / 3s, burst 3). LPS 404 throttles reconnect attempts to once per hour. No watchdog is registered -- a legitimately long blocking Read must not trigger a device reboot. A new controllerconn.Client.OpenLocalStream helper provides the streaming HTTP client (reuses DialerWithResolverCache, adds TCP keepalive for dead-peer detection, disables HTTP keep-alive for clean connection teardown, and drops the per-request timeout that SendLocal applies). The existing triggerProfileGET is exported as TriggerProfileGET for symmetry with the other Trigger*POST helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Milan Lenco <milan@zededa.com>

Fire an immediate LPS Network-endpoint POST whenever a substantive change is detected in either the device port config list (handleDPCLImpl) or the device network status (handleDNSImpl), so the local operator sees the effect of a network config change right away instead of waiting for the next periodic post. A burst of updates during, e.g., DPC verification is naturally coalesced by the networkTicker's size-1 buffered channel: TickNow is a non-blocking send, so excess kicks arriving while a POST is already in flight or pending are dropped. Signed-off-by: Milan Lenco <milan@zededa.com>

Populate the new NetworkInfo.port_status field from DeviceNetworkStatus when EVE posts to the LPS Network endpoint, giving the local operator a view of the kernel-observed state of each network port (link up/down, MAC address, currently-assigned IP addresses, active default routers, effective DNS servers and search domain, NTP servers in use, and applied MTU) alongside the existing declarative config views. CIDR-formatted IP addresses require the interface's subnet mask, which DeviceNetworkStatus previously did not carry in AddrInfoList. Extend types.AddrInfo with a Mask field and populate it from the netlink address entry in DpcManager.updateDNS. Signed-off-by: Milan Lenco <milan@zededa.com>

Add timer.lps.<task>.interval config properties (profile, radio, appinfo, devinfo, network, appbootinfo) with defaults matching current hard-coded values, min 3s, max 1h. LocalCmdAgent initializes globalConfig with DefaultConfigItemValueMap() so all tasks use the correct default before the first real config arrives. Interval changes take effect immediately on the next UpdateGlobalConfig call without resetting throttle state. Signed-off-by: Milan Lenco <milan@zededa.com>

milan-zededa · 2026-05-07T09:24:31Z

Rebased on the top of the latest master.

eriknordmark · 2026-05-07T14:38:31Z

+| timer.lps.profile.interval | integer in seconds | 60 (1 minute) | 3 | 3600 (1 hour) | how frequently EVE fetches the local profile from the Local Profile Server (LPS) |
+| timer.lps.radio.interval | integer in seconds | 5 | 3 | 3600 (1 hour) | how frequently EVE POSTs radio status to LPS and fetches radio silence configuration |
+| timer.lps.appinfo.interval | integer in seconds | 60 (1 minute) | 3 | 3600 (1 hour) | how frequently EVE POSTs application info to LPS and fetches application commands |
+| timer.lps.devinfo.interval | integer in seconds | 60 (1 minute) | 3 | 3600 (1 hour) | how frequently EVE POSTs device info to LPS and fetches device commands |
+| timer.lps.network.interval | integer in seconds | 60 (1 minute) | 3 | 3600 (1 hour) | how frequently EVE POSTs network configuration to LPS and fetches locally-made network configuration |
+| timer.lps.appbootinfo.interval | integer in seconds | 60 (1 minute) | 3 | 3600 (1 hour) | how frequently EVE POSTs application boot info to LPS and fetches boot configuration |


Does it make sense to add a general note about all these below the table that if the device gets a 404(??) it will back off the timer a lot?

eriknordmark

If/when the long-lived connection dies and is re-established is the re-establishment done automatically by EVE?

Who is responsible for ensuring that the LPS can track the current state after the re-establishment?
Should the LPS notice that there is a new TCP/HTTPS connection and use that to trigger fetching/updating the current state? If so, would be be useful if EVE sent a signal over the long-lived connection immediately after it has been re-established the connection so the LPS doesn't need to detect new connections at the lower layers?

milan-zededa · 2026-05-07T15:40:15Z

If/when the long-lived connection dies and is re-established is the re-establishment done automatically by EVE?

Who is responsible for ensuring that the LPS can track the current state after the re-establishment? Should the LPS notice that there is a new TCP/HTTPS connection and use that to trigger fetching/updating the current state? If so, would be be useful if EVE sent a signal over the long-lived connection immediately after it has been re-established the connection so the LPS doesn't need to detect new connections at the lower layers?

This signal endpoint is really just an optimization to minimize latency. If we loose some signal from LPS for whatever reason, the only consequence is just longer latency (like it is today without this enhancement).
So we do not impose that LPS must react to re-connection in any way. A good implementation could track all pending changes and re-emit them when reconnection is detected (a new GET request is received for /signal). But a "dumb" implementation which does not keep track of pending changes and simply sends&forget signal on any user input will functionality-wise work as well (even though user-experience is not the best in these edge-cases).

eriknordmark

A few comments:

No unit tests for signal.go. A new ~300-line state machine with five
outcomes, an exported public API (OpenLocalStream), and intricate
cancel/restart semantics ships without a single test. At minimum, handleSignal
(rate-limit + dispatch table) and readSigHandlerStream (NDJSON framing +
malformed-line tolerance) are easy to unit-test with an io.Pipe. Worth adding
before merge given the long-lived-connection complexity.
sigHandlerLimiter config inconsistency. Burst is 3 over a 3 s refill, so a
single legitimate signal listing all 6 endpoints will not exceed the limit (it
dispatches synchronously inside handleSignal and consumes 1 token, not 6 —
Allow() is called once per signal, not per endpoint). That is fine for normal
use, but the comment says "periodic polling is the correctness guarantee;
dropped signals are safe" while the limiter is actually quite generous.
Consider documenting that the limiter is per-message, not per-endpoint, so
future maintainers don't tighten it incorrectly.
TCP keepalive is key for the robustness, but if we fail to set it there isn't even any logs in. Makes sense adding such logs.
(controllerconn/send.go:111-115):
if tcpConn, ok := conn.(*net.TCPConn); ok {
_ = tcpConn.SetKeepAlive(true)
_ = tcpConn.SetKeepAlivePeriod(keepAlive)
}
If DialerWithResolverCache ever wraps the connection (e.g., for proxying),
keepalive — the only dead-peer detection mechanism per the design — is
silently disabled. At least log at Trace level when the assertion fails so
this doesn't become a silent debugging problem later.
Empty mask → no /N in CIDR string. dnsPortsToProto builds &net.IPNet{IP:
addr.Addr, Mask: addr.Mask} and calls String(). If addr.Mask is nil/empty
(possible on a freshly persisted DPC before updateDNS has run, or if a code
path forgets to populate it), IPNet.String() returns just the IP — violating
the proto comment that promises CIDR format. Either guarantee Mask presence by
construction, or fall back to /32 or /128 based on len(addr.Addr) so the wire
format is always CIDR.
Log line could be large (signal.go:864): Warnf(... line=%q) logs up to 64
KiB of attacker-controlled bytes per malformed line. Truncate to ~256 chars in
the format string.
Unbounded pendingChanges slice.
A malicious LPS can send a Signal with pendingChanges containing millions of
entries (within the 64 KiB line cap, that's roughly 2k–5k enum repetitions).
Each is iterated in handleSignal. Iteration is O(n) with cheap per-element
work (one switch + one tickNow), so a 5k-entry signal completes in
microseconds. Not exploitable, but a len(pendingChanges) > 32 early-return
would cost nothing.
add a one-line // see eve-api
PROFILE.md ### Signal — auth model is intentional
comment near OpenLocalStream

For #3 and #4 a suggestion is:
Push keepalive into the dialer itself. net.Dialer has a KeepAlive
time.Duration field that handles this internally. Plumb a KeepAlive field
through DialerWithResolverCache so the inner net.Dialer is constructed with
it:

stdDialer := net.Dialer{
Resolver: resolver.getNetResolver(),
LocalAddr: &net.TCPAddr{IP: d.localIP},
Timeout: d.timeout,
KeepAlive: d.keepAlive, // <-- new
}

milan-zededa requested a review from uncleDecart May 5, 2026 17:17

milan-zededa requested a review from eriknordmark as a code owner May 5, 2026 17:17

milan-zededa added enhancement New feature or request stable Should be backported to stable release(s) labels May 5, 2026

github-actions Bot requested review from OhmSpectator and christoph-zededa May 5, 2026 17:18

eriknordmark approved these changes May 5, 2026

View reviewed changes

uncleDecart approved these changes May 6, 2026

View reviewed changes

milan-zededa and others added 6 commits May 7, 2026 11:23

Update eve-api to include the LPS Signaling endpoint

42be8b6

See: lf-edge/eve-api#144 Signed-off-by: Milan Lenco <milan@zededa.com>

milan-zededa force-pushed the lps-signaling branch from 1e5b138 to d22cfef Compare May 7, 2026 09:23

github-actions Bot requested review from eriknordmark and uncleDecart May 7, 2026 09:24

eriknordmark reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LPS Network Endpoint Enhancements and Signaling#5904

LPS Network Endpoint Enhancements and Signaling#5904
milan-zededa wants to merge 6 commits intolf-edge:masterfrom
milan-zededa:lps-signaling

milan-zededa commented May 5, 2026

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

eriknordmark left a comment

Uh oh!

rene commented May 6, 2026

Uh oh!

milan-zededa commented May 6, 2026 •

edited

Loading

Uh oh!

eriknordmark commented May 6, 2026

Uh oh!

milan-zededa commented May 7, 2026 •

edited

Loading

Uh oh!

eriknordmark May 7, 2026

Uh oh!

eriknordmark left a comment

Uh oh!

milan-zededa commented May 7, 2026 •

edited

Loading

Uh oh!

eriknordmark left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

milan-zededa commented May 5, 2026

Description

1. Signal endpoint — low-latency LPS config notifications (GET /api/v1/signal)

2. Per-port runtime status (NetworkInfo.port_status)

3. local_modifications_allowed per port in NetworkInfo.latest_config

4. Reactive LPS network POST on config/status change

5. Configurable LPS polling intervals

How to test and validate this PR

Prerequisites

1. Validate local_modifications_allowed in NetworkInfo

2. Validate per-port runtime status (port_status)

3. Validate the Signal endpoint — basic trigger

4. Validate the Signal endpoint — multi-endpoint coalescing

5. Validate configurable polling intervals

Changelog notes

PR Backports

Checklist

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eriknordmark left a comment

Choose a reason for hiding this comment

Uh oh!

rene commented May 6, 2026

Uh oh!

milan-zededa commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eriknordmark commented May 6, 2026

Uh oh!

milan-zededa commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eriknordmark May 7, 2026

Choose a reason for hiding this comment

Uh oh!

eriknordmark left a comment

Choose a reason for hiding this comment

Uh oh!

milan-zededa commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eriknordmark left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. Signal endpoint — low-latency LPS config notifications (`GET /api/v1/signal`)

2. Per-port runtime status (`NetworkInfo.port_status`)

3. `local_modifications_allowed` per port in `NetworkInfo.latest_config`

1. Validate `local_modifications_allowed` in NetworkInfo

2. Validate per-port runtime status (`port_status`)

codecov Bot commented May 5, 2026 •

edited

Loading

milan-zededa commented May 6, 2026 •

edited

Loading

milan-zededa commented May 7, 2026 •

edited

Loading

milan-zededa commented May 7, 2026 •

edited

Loading

eriknordmark left a comment •

edited

Loading