feat: add OTel Collector misconfiguration detection to KOF UI#636
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds OpenTelemetry Collector misconfiguration detection to the KOF UI. The changes introduce a new hierarchical data structure (Cluster → CustomResource → Pod) to support displaying configuration warnings and errors for OTel collectors in the UI. The backend now inspects OTel collector status to detect replica mismatches and deployment issues, communicating these to the frontend through a new status message system.
Key changes:
- Restructured metrics data model to support a three-level hierarchy (clusters containing custom resources containing pods)
- Added status message infrastructure for warnings and errors at cluster, custom resource, and pod levels
- Implemented OTel collector health detection based on replica status
- Added UI components to display configuration alerts for custom resources
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| kof-operator/webapp/collector/src/providers/victoria_metrics/VictoriaMetricsProvider.tsx | Removed unused types and simplified fetch response handling to use new ClusterData structure |
| kof-operator/webapp/collector/src/components/shared/CustomResource.tsx | Added new component to display custom resource alerts with warning/error styling |
| kof-operator/webapp/collector/src/components/pages/collectorPage/models.ts | Restructured data model to three-level hierarchy with status messages at each level |
| kof-operator/webapp/collector/src/components/pages/victoriaPage/victoria-list/VictoriaTable.tsx | Updated to render custom resources with their pods instead of flat pod list |
| kof-operator/webapp/collector/src/components/pages/collectorPage/components/collector-list/CollectorsTable.tsx | Updated to render custom resources with their pods instead of flat pod list |
| kof-operator/webapp/collector/src/components/pages/collectorPage/CollectorPage.tsx | Fixed pod selection logic to use new customResource structure |
| kof-operator/webapp/collector/src/components/features/AppSidebar.tsx | Added alert indicator for collectors with unhealthy pods |
| kof-operator/internal/server/handlers/victoria_handler.go | Refactored to use custom resource abstraction and group pods by Victoria Metrics/Logs |
| kof-operator/internal/server/handlers/types.go | Added ICustomResource interface and ResourceStatus type |
| kof-operator/internal/server/handlers/metrics_handler.go | Refactored to collect metrics per custom resource with status messages |
| kof-operator/internal/server/handlers/collector_metrics_handler.go | Implemented OTel collector status detection for replica mismatches and deployment issues |
| kof-operator/internal/metrics/types.go | Restructured types to support hierarchical data and status messages |
| kof-operator/internal/metrics/service.go | Renamed Service to CollectorService for clarity |
| kof-operator/internal/metrics/resources.go | Updated method names to match new CollectorService type |
| kof-operator/internal/metrics/metrics.go | Replaced flat Add methods with hierarchical AddMetric and AddStatus methods |
| kof-operator/internal/metrics/internal.go | Updated to use new CollectorService type |
| kof-operator/internal/metrics/helper.go | Added sendStatus method and updated send to sendMetric |
| kof-operator/internal/metrics/health.go | Updated to use new CollectorService type and method names |
| kof-operator/internal/k8s/otel.go | Added OpenTelemetry collector querying and pod selector extraction |
| kof-operator/internal/k8s/client.go | Added OTel scheme and QPS/Burst rate limiting configuration |
| kof-operator/go.mod | Updated Go version and dependencies including opentelemetry-operator |
| kof-operator/internal/database/MetricsDatabase.ts | Updated to use ClusterData instead of PodsMap |
| kof-operator/webapp/collector/src/providers/collectors_metrics/CollectorsMetricsRecordManager.tsx | Changed to use clustersMap property instead of toClusterMap method |
| charts/kof-mothership/templates/kof-operator/role.yaml | Added RBAC permissions for opentelemetrycollectors resources |
98114b8 to
3b344b5
Compare
There was a problem hiding this comment.
Do we have any unit tests on the backend side as none of it was updated?
I would suggest to implement a fake http server to validate that data structures are solid for several scenarios like:
- None of the pods available for a custom resource like OpentelemetryCollector
- Two pods are available for a custom resource and the data retrieval is successful
- Two pods are available for custom resource, but only one pod successfully returns metrics.
240ad09 to
75cee82
Compare
Closes #543