Skip to content

feat: add OTel Collector misconfiguration detection to KOF UI#636

Merged
AndrejsPon00 merged 9 commits into
k0rdent:mainfrom
AndrejsPon00:fix-collector-failure-detection
Nov 28, 2025
Merged

feat: add OTel Collector misconfiguration detection to KOF UI#636
AndrejsPon00 merged 9 commits into
k0rdent:mainfrom
AndrejsPon00:fix-collector-failure-detection

Conversation

@AndrejsPon00
Copy link
Copy Markdown
Contributor

Closes #543

Copilot AI review requested due to automatic review settings November 24, 2025 16:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenTelemetry Collector misconfiguration detection to the KOF UI. The changes introduce a new hierarchical data structure (Cluster → CustomResource → Pod) to support displaying configuration warnings and errors for OTel collectors in the UI. The backend now inspects OTel collector status to detect replica mismatches and deployment issues, communicating these to the frontend through a new status message system.

Key changes:

  • Restructured metrics data model to support a three-level hierarchy (clusters containing custom resources containing pods)
  • Added status message infrastructure for warnings and errors at cluster, custom resource, and pod levels
  • Implemented OTel collector health detection based on replica status
  • Added UI components to display configuration alerts for custom resources

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
kof-operator/webapp/collector/src/providers/victoria_metrics/VictoriaMetricsProvider.tsx Removed unused types and simplified fetch response handling to use new ClusterData structure
kof-operator/webapp/collector/src/components/shared/CustomResource.tsx Added new component to display custom resource alerts with warning/error styling
kof-operator/webapp/collector/src/components/pages/collectorPage/models.ts Restructured data model to three-level hierarchy with status messages at each level
kof-operator/webapp/collector/src/components/pages/victoriaPage/victoria-list/VictoriaTable.tsx Updated to render custom resources with their pods instead of flat pod list
kof-operator/webapp/collector/src/components/pages/collectorPage/components/collector-list/CollectorsTable.tsx Updated to render custom resources with their pods instead of flat pod list
kof-operator/webapp/collector/src/components/pages/collectorPage/CollectorPage.tsx Fixed pod selection logic to use new customResource structure
kof-operator/webapp/collector/src/components/features/AppSidebar.tsx Added alert indicator for collectors with unhealthy pods
kof-operator/internal/server/handlers/victoria_handler.go Refactored to use custom resource abstraction and group pods by Victoria Metrics/Logs
kof-operator/internal/server/handlers/types.go Added ICustomResource interface and ResourceStatus type
kof-operator/internal/server/handlers/metrics_handler.go Refactored to collect metrics per custom resource with status messages
kof-operator/internal/server/handlers/collector_metrics_handler.go Implemented OTel collector status detection for replica mismatches and deployment issues
kof-operator/internal/metrics/types.go Restructured types to support hierarchical data and status messages
kof-operator/internal/metrics/service.go Renamed Service to CollectorService for clarity
kof-operator/internal/metrics/resources.go Updated method names to match new CollectorService type
kof-operator/internal/metrics/metrics.go Replaced flat Add methods with hierarchical AddMetric and AddStatus methods
kof-operator/internal/metrics/internal.go Updated to use new CollectorService type
kof-operator/internal/metrics/helper.go Added sendStatus method and updated send to sendMetric
kof-operator/internal/metrics/health.go Updated to use new CollectorService type and method names
kof-operator/internal/k8s/otel.go Added OpenTelemetry collector querying and pod selector extraction
kof-operator/internal/k8s/client.go Added OTel scheme and QPS/Burst rate limiting configuration
kof-operator/go.mod Updated Go version and dependencies including opentelemetry-operator
kof-operator/internal/database/MetricsDatabase.ts Updated to use ClusterData instead of PodsMap
kof-operator/webapp/collector/src/providers/collectors_metrics/CollectorsMetricsRecordManager.tsx Changed to use clustersMap property instead of toClusterMap method
charts/kof-mothership/templates/kof-operator/role.yaml Added RBAC permissions for opentelemetrycollectors resources

Comment thread kof-operator/internal/server/handlers/collector_metrics_handler.go Outdated
@AndrejsPon00 AndrejsPon00 force-pushed the fix-collector-failure-detection branch from 98114b8 to 3b344b5 Compare November 25, 2025 09:52
Copy link
Copy Markdown
Collaborator

@gmlexx gmlexx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any unit tests on the backend side as none of it was updated?

I would suggest to implement a fake http server to validate that data structures are solid for several scenarios like:

  • None of the pods available for a custom resource like OpentelemetryCollector
  • Two pods are available for a custom resource and the data retrieval is successful
  • Two pods are available for custom resource, but only one pod successfully returns metrics.

Comment thread kof-operator/internal/k8s/otel.go Outdated
Comment thread kof-operator/internal/metrics/helper.go
Comment thread kof-operator/internal/server/handlers/metrics_handler.go Outdated
Comment thread kof-operator/internal/server/handlers/metrics_handler.go Outdated
Comment thread kof-operator/webapp/collector/src/components/features/AppSidebar.tsx Outdated
Comment thread kof-operator/internal/k8s/otel.go Outdated
@AndrejsPon00 AndrejsPon00 requested a review from gmlexx November 26, 2025 17:19
Comment thread kof-operator/internal/server/handlers/collector_metrics_handler.go Outdated
Comment thread kof-operator/internal/server/handlers/collector_metrics_handler.go Outdated
Comment thread kof-operator/internal/server/handlers/collector_metrics_handler.go
Comment thread kof-operator/internal/server/handlers/collector_metrics_handler.go Outdated
@AndrejsPon00 AndrejsPon00 requested a review from gmlexx November 27, 2025 09:45
@AndrejsPon00 AndrejsPon00 force-pushed the fix-collector-failure-detection branch from 240ad09 to 75cee82 Compare November 28, 2025 09:12
@AndrejsPon00 AndrejsPon00 merged commit 9e8f0b1 into k0rdent:main Nov 28, 2025
10 of 11 checks passed
@github-project-automation github-project-automation Bot moved this to Done in k0rdent Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Operator doesn't detect failed collectors configuration

3 participants