app/vlagent: feat: add namespace labels/annotations metadata and flags#949
app/vlagent: feat: add namespace labels/annotations metadata and flags#949withlin wants to merge 3 commits intoVictoriaMetrics:masterfrom
Conversation
|
@withlin , thank you for the pull request! @vadimalekseev , could you review this pull request? |
vadimalekseev
left a comment
There was a problem hiding this comment.
@withlin thank you for the contribution! Looks good overall, we just need to fix some edge cases to make the code more reliable
| return ns | ||
| } | ||
|
|
||
| ns, err := kc.client.getNamespace(kc.ctx, namespaceName) |
There was a problem hiding this comment.
@valyala, can we use updateNeededFields to determine if the excludeFilter relies on namespace metadata? This would help us decide whether fetching this information from the Kubernetes API is necessary.
For example:
pf := &prefixfilter.Filter{}
fc.excludeFilter.UpdateNeededFields(pf)
if *includeNamespaceLabels || *includeNamespaceAnnotations ||
pf.MatchStringOrWildcard("kubernetes.namespace_labels.*") ||
pf.MatchStringOrWildcard("kubernetes.namespace_annotations.*") {
// fetch namespace labels and annotations from Kubernetes API server...
}If the answer is 'yes' let's address this in a separate PR to keep this contribution simple.
There was a problem hiding this comment.
can we use updateNeededFields to determine if the excludeFilter relies on namespace metadata?
I'm not entirely convinced this is a useful feature, as it won't really save much. There aren't many namespaces, and storing them in memory is practically free.
However it could allow us to avoid requesting permissions for resources we don't actually need and completely avoid synchronous network calls within the startRead function, even when encountering a new namespace
…g reading This change improves performance and security by fetching all namespace metadata once at application startup instead of making on-demand network calls during log processing. Key changes: - Add getAllNamespaces() method to fetch all namespaces in a single API call - Pre-fetch and cache all namespace metadata in startKubernetesCollector() - Remove getNamespace() on-demand fetching to eliminate runtime network calls - Change getNamespace() to cache-only lookup returning (namespace, bool) - Skip pods from namespaces not in cache to prevent processing logs with incomplete metadata that could bypass excludeFilter Benefits: - Eliminates network call latency during log reading (no blocking) - Prevents vlagent from appearing hung during network issues - Ensures excludeFilter has complete namespace labels/annotations for security-sensitive filtering (e.g., audit logs) - Simplifies code by removing timeout/retry logic - Fail-fast: vlagent exits if unable to fetch namespaces at startup Note: If a namespace is created after vlagent starts, pods in that namespace will be skipped until vlagent is restarted. This is acceptable as it ensures we never process logs without complete metadata. Updates VictoriaMetrics#949
…eFilter This change ensures vlagent always has complete namespace metadata for security-sensitive filtering while avoiding blocking log processing with network calls. Key changes: **Namespace handling improvements:** - Add getAllNamespaces() method to fetch all namespaces in a single API call - Pre-fetch all namespace metadata at vlagent startup - Implement auto-refresh in getNamespace() when new namespace is discovered - Add refreshNamespaces() helper to update cache from Kubernetes API - Fail-fast behavior: terminate vlagent if namespace cannot be fetched **Performance optimizations:** - Remove namespacesLock: only one goroutine (Pod event handler) accesses map - Eliminate network calls in hot path: getNamespace() checks cache first - Refresh only when needed: triggered by unknown namespace detection - Return namespaceList with ResourceVersion for potential future watch support **Security improvements:** - Ensure excludeFilter always has complete namespace labels/annotations - Prevent processing logs with incomplete metadata that could bypass filters - Terminate vlagent if namespace refresh fails (fail-safe approach) **Documentation updates:** - Clarify that node, pod, and namespace labels are not updated at runtime - Document that new namespaces are automatically discovered after startup - Note that metadata changes to existing resources require restart - Update flag examples to use correct field name (kubernetes.pod_namespace) Implementation details: The getNamespace() method works as follows: 1. Fast path: return namespace from cache if present 2. Slow path: if not in cache, call refreshNamespaces() 3. After refresh: lookup again, fail if still not found 4. Log refreshed namespace count for monitoring This approach supports dynamic namespace creation while ensuring we never process logs without complete metadata, which is critical for security- sensitive use cases like audit log filtering. Behavior: - At startup: fetch all namespaces, exit if fails - During runtime: refresh namespace list when unknown namespace detected - On refresh failure: terminate vlagent to prevent security bypass - On namespace not found: terminate vlagent (should never happen) Example use case: User filters audit logs with: -kubernetesCollector.excludeFilter='kubernetes.namespace_labels.security:high' Without this change: network timeout could cause incomplete metadata, bypassing the filter and leaking sensitive logs. With this change: vlagent terminates on fetch failure, ensuring no logs are processed without proper filtering. Updates VictoriaMetrics#949
6c276c8 to
2fd6d41
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #949 +/- ##
=========================================
Coverage ? 64.36%
=========================================
Files ? 219
Lines ? 38421
Branches ? 0
=========================================
Hits ? 24730
Misses ? 12313
Partials ? 1378 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| nsl, err := client.getAllNamespaces(ctx) | ||
| if err != nil { | ||
| cancel() |
There was a problem hiding this comment.
@vadimalekseev , it is better from maintainability PoV to put the defer cancel() call directly after the context.WithCancel() call above. This will prevent from possible issues with missing cancel() calls before early return paths from the function.
@vadimalekseev , could you update all the vlagent code according to this recommendation in a separate pull request?
dc09382 to
a7883f4
Compare
a7883f4 to
4c2ad56
Compare
|
hi @valyala and @vadimalekseev any advice? |
|
@withlin hey, could you sync this branch with master? |
| // namespaces caches metadata for all namespaces in the cluster. | ||
| // namespacesLock guards concurrent access, since refreshNamespaces may run while events are handled. | ||
| namespaces map[string]namespace | ||
| namespacesLock sync.RWMutex |
There was a problem hiding this comment.
Could you please double-check if we really need this mutex here?
Describe Your Changes
Relate: #923, #927
support namespace labels/annotations metadata level filter and flags
Checklist
The following checks are mandatory: