Skip to content

app/vlagent: feat: add namespace labels/annotations metadata and flags#949

Open
withlin wants to merge 3 commits intoVictoriaMetrics:masterfrom
withlin:feature/exclude-filter-flag-doc
Open

app/vlagent: feat: add namespace labels/annotations metadata and flags#949
withlin wants to merge 3 commits intoVictoriaMetrics:masterfrom
withlin:feature/exclude-filter-flag-doc

Conversation

@withlin
Copy link
Contributor

@withlin withlin commented Dec 24, 2025

Describe Your Changes

Relate: #923, #927

support namespace labels/annotations metadata level filter and flags

Checklist

The following checks are mandatory:

@valyala
Copy link
Contributor

valyala commented Dec 24, 2025

@withlin , thank you for the pull request!

@vadimalekseev , could you review this pull request?

Copy link
Member

@vadimalekseev vadimalekseev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@withlin thank you for the contribution! Looks good overall, we just need to fix some edge cases to make the code more reliable

return ns
}

ns, err := kc.client.getNamespace(kc.ctx, namespaceName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valyala, can we use updateNeededFields to determine if the excludeFilter relies on namespace metadata? This would help us decide whether fetching this information from the Kubernetes API is necessary.

For example:

pf := &prefixfilter.Filter{}
fc.excludeFilter.UpdateNeededFields(pf)

if *includeNamespaceLabels || *includeNamespaceAnnotations ||
	pf.MatchStringOrWildcard("kubernetes.namespace_labels.*") ||
	pf.MatchStringOrWildcard("kubernetes.namespace_annotations.*") {
	// fetch namespace labels and annotations from Kubernetes API server...
}

If the answer is 'yes' let's address this in a separate PR to keep this contribution simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use updateNeededFields to determine if the excludeFilter relies on namespace metadata?

I'm not entirely convinced this is a useful feature, as it won't really save much. There aren't many namespaces, and storing them in memory is practically free.

However it could allow us to avoid requesting permissions for resources we don't actually need and completely avoid synchronous network calls within the startRead function, even when encountering a new namespace

withlin added a commit to withlin/VictoriaLogs that referenced this pull request Dec 25, 2025
…g reading

This change improves performance and security by fetching all namespace
metadata once at application startup instead of making on-demand network
calls during log processing.

Key changes:

- Add getAllNamespaces() method to fetch all namespaces in a single API call
- Pre-fetch and cache all namespace metadata in startKubernetesCollector()
- Remove getNamespace() on-demand fetching to eliminate runtime network calls
- Change getNamespace() to cache-only lookup returning (namespace, bool)
- Skip pods from namespaces not in cache to prevent processing logs with
  incomplete metadata that could bypass excludeFilter

Benefits:

- Eliminates network call latency during log reading (no blocking)
- Prevents vlagent from appearing hung during network issues
- Ensures excludeFilter has complete namespace labels/annotations for
  security-sensitive filtering (e.g., audit logs)
- Simplifies code by removing timeout/retry logic
- Fail-fast: vlagent exits if unable to fetch namespaces at startup

Note: If a namespace is created after vlagent starts, pods in that namespace
will be skipped until vlagent is restarted. This is acceptable as it ensures
we never process logs without complete metadata.

Updates VictoriaMetrics#949
withlin added a commit to withlin/VictoriaLogs that referenced this pull request Dec 25, 2025
…eFilter

This change ensures vlagent always has complete namespace metadata for
security-sensitive filtering while avoiding blocking log processing with
network calls.

Key changes:

**Namespace handling improvements:**
- Add getAllNamespaces() method to fetch all namespaces in a single API call
- Pre-fetch all namespace metadata at vlagent startup
- Implement auto-refresh in getNamespace() when new namespace is discovered
- Add refreshNamespaces() helper to update cache from Kubernetes API
- Fail-fast behavior: terminate vlagent if namespace cannot be fetched

**Performance optimizations:**
- Remove namespacesLock: only one goroutine (Pod event handler) accesses map
- Eliminate network calls in hot path: getNamespace() checks cache first
- Refresh only when needed: triggered by unknown namespace detection
- Return namespaceList with ResourceVersion for potential future watch support

**Security improvements:**
- Ensure excludeFilter always has complete namespace labels/annotations
- Prevent processing logs with incomplete metadata that could bypass filters
- Terminate vlagent if namespace refresh fails (fail-safe approach)

**Documentation updates:**
- Clarify that node, pod, and namespace labels are not updated at runtime
- Document that new namespaces are automatically discovered after startup
- Note that metadata changes to existing resources require restart
- Update flag examples to use correct field name (kubernetes.pod_namespace)

Implementation details:

The getNamespace() method works as follows:
1. Fast path: return namespace from cache if present
2. Slow path: if not in cache, call refreshNamespaces()
3. After refresh: lookup again, fail if still not found
4. Log refreshed namespace count for monitoring

This approach supports dynamic namespace creation while ensuring we never
process logs without complete metadata, which is critical for security-
sensitive use cases like audit log filtering.

Behavior:
- At startup: fetch all namespaces, exit if fails
- During runtime: refresh namespace list when unknown namespace detected
- On refresh failure: terminate vlagent to prevent security bypass
- On namespace not found: terminate vlagent (should never happen)

Example use case:
User filters audit logs with:
  -kubernetesCollector.excludeFilter='kubernetes.namespace_labels.security:high'

Without this change: network timeout could cause incomplete metadata,
bypassing the filter and leaking sensitive logs.

With this change: vlagent terminates on fetch failure, ensuring no logs
are processed without proper filtering.

Updates VictoriaMetrics#949
@withlin withlin force-pushed the feature/exclude-filter-flag-doc branch from 6c276c8 to 2fd6d41 Compare December 25, 2025 12:41
@codecov
Copy link

codecov bot commented Dec 25, 2025

Codecov Report

❌ Patch coverage is 1.88679% with 52 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@0153853). Learn more about missing BASE report.

Files with missing lines Patch % Lines
app/vlagent/kubernetescollector/collector.go 0.00% 34 Missing ⚠️
app/vlagent/kubernetescollector/client.go 0.00% 15 Missing ⚠️
app/vlagent/kubernetescollector/processor.go 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #949   +/-   ##
=========================================
  Coverage          ?   64.36%           
=========================================
  Files             ?      219           
  Lines             ?    38421           
  Branches          ?        0           
=========================================
  Hits              ?    24730           
  Misses            ?    12313           
  Partials          ?     1378           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


nsl, err := client.getAllNamespaces(ctx)
if err != nil {
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vadimalekseev , it is better from maintainability PoV to put the defer cancel() call directly after the context.WithCancel() call above. This will prevent from possible issues with missing cancel() calls before early return paths from the function.

@vadimalekseev , could you update all the vlagent code according to this recommendation in a separate pull request?

@withlin withlin force-pushed the feature/exclude-filter-flag-doc branch 3 times, most recently from dc09382 to a7883f4 Compare December 26, 2025 13:18
@withlin withlin force-pushed the feature/exclude-filter-flag-doc branch from a7883f4 to 4c2ad56 Compare December 27, 2025 12:45
@withlin
Copy link
Contributor Author

withlin commented Dec 30, 2025

hi @valyala and @vadimalekseev any advice?

@vadimalekseev
Copy link
Member

@withlin hey, could you sync this branch with master?

// namespaces caches metadata for all namespaces in the cluster.
// namespacesLock guards concurrent access, since refreshNamespaces may run while events are handled.
namespaces map[string]namespace
namespacesLock sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please double-check if we really need this mutex here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants