feat: Switch Metric Collectors to Opentelemetry-kube-stack by aglarendil · Pull Request #273 · k0rdent/kof

aglarendil · 2025-05-13T17:03:42Z

closes #204
refer to opentelemetry-kube-stack and use it collect all the metrics from within the kof-collectors chart

we have 4 collectors:

kube cluster collectors (cluster-stats)

enable as much as possible for kube collection

node daemon collectors

host metrics
additional scrape config for "kubernetes-pods" jobs and kubelet metrics (/cadvisor, /metrics, /metrics/resource, /metrics/probes)

k0s components collector (hostnetwork, polling etcd, kube-controller-manager)

prometheus receiver with a scrape config to poll pods such as kube-controller-manager, scheduler, etcd (launched by k0s)

syslog collector that also extracts contents using Grok patterns as for instance default Ubuntu 24.04 log format forwarded by systemd to rsyslog is not in any way syslog-rfc-compliant

target-allocator collector - dedicated

works only against prometheus objects. ta is enabled indepedently as it modifies how node collectors receive targets to scrape and affects the part of the scrape config for daemon that uses hacks around env variables such as OTEK_KUBE_NODE_NAME and others. so we separate those 2 daemons for them not to step on each other's toes

collectors are sprinkled over with attribute transformers and populate node/job/instance and/or their opentelemetry counterparts (i.e. service.instance.id), so when we use kube-prom-stack dashboards and alerts, we do not have label discrepancy.

possible known issues with this version:

some attributes need to be additionally renamed ('/hostfs' for /var/log/syslog)
some collectors are commented out (journald is still alpha, but can be parametrised to be enabled, requires json parsing and ugly hacks with LD_LIBRARY_PATH - honestly, to be removed in this version)
requires additional filtering of redundant and very noise metrics such as some of kubeapi latency buckets, etc.
some servicemonitors might collect the same metrics as other collectors (i.e. node-exporter for daemon collector and node-exporter via service monitor

to be cleaned up as well)

otel-operator requires explicit setting for fallbackstrategy to collect non-Node service monitors such as apiserver

P.S. otel-kube-stack directory is a sandbox playground to be removed in a consequent commit bit later

refer to opentelemetry-kube-stack and use it collect all the metrics from within the kof-collectors chart also modify the MCS for kof-child and kof-regional accordingly to refer to the secrets related to basic-auth and also "parametrise" exporters configuration we have 4 collectors: 1. kube cluster collectors (cluster-stats) enable as much as possible for kube collection 2. node daemon collectors host metrics additional scrape config for "kubernetes-pods" jobs and kubelet metrics (/cadvisor, /metrics, /metrics/resource, /metrics/probes) k0s components collector (hostnetwork, polling etcd, kube-controller-manager) prometheus receiver with a scrape config to poll pods such as kube-controller-manager, scheduler, etcd (launched by k0s) syslog collector that also extracts contents using Grok patterns as for instance default Ubuntu 24.04 log format forwarded by systemd to rsyslog is not in any way syslog-rfc-compliant 4. target-allocator collector - dedicated works only against prometheus objects. ta is enabled indepedently as it modifies how node collectors receive targets to scrape and affects the part of the scrape config for daemon that uses hacks around env variables such as OTEK_KUBE_NODE_NAME and others. so we separate those 2 daemons for them not to step on each other's toes collectors are sprinkled over with attribute transformers and populate node/job/instance and/or their opentelemetry counterparts (i.e. service.instance.id), so when we use kube-prom-stack dashboards and alerts, we do not have label discrepancy. possible known issues with this version: 1. some attributes need to be additionally renamed ('/hostfs' for /var/log/syslog) 2. some collectors are commented out (journald is still alpha, but can be parametrised to be enabled, requires json parsing and ugly hacks with LD_LIBRARY_PATH - honestly, to be removed in this version) 3. requires additional filtering of redundant and very noise metrics such as some of kubeapi latency buckets, etc. 4. some servicemonitors might collect the same metrics as other collectors (i.e. node-exporter for daemon collector and node-exporter via service monitor - to be cleaned up as well) 5. otel-operator requires explicit setting for fallbackstrategy to collect non-Node service monitors such as apiserver 6. istiod and other pod-based collection from upstream kof was also deleted as it should be picked up by "kubernetes-pods" job cleanup: remove journald binary dirty hack and comments fix: defaultCRConfig values for default helm install fix: remove kube-state-metrics and prom-node-exporter from subcharts they are installed as subcharts of opentelemetry-kube-stack fix: remove unneeded auth exts from regional mcs fix: add auth exts removed from default config fix: adopt istio to passing values explicitly to otel-kube-stack misc: use cluster label instead of clusterName address reviews resolve global.cluster vs global.clusterName conundrum global.clusterName may still be used by a pair of subcharts while clusterLabel should surely be used for dashboards and metrics/alerts to mirror the default behaviour in upstream however, for vmoperator and vlogscluster global.cluster is already reserved to contain some info in map format, so we just resort to: clusterLabel - used for dashboards and metrics clusterName - to have clustername in chart values fix: fix variable interpolation for istio child template fix: follow cluster and config changes in corresponding ns fixup insecure flag, as it is not needed

…0rdent#310)" This reverts commit c244f6e.

github-project-automation Bot added this to k0rdent and Project 2A May 13, 2025

aglarendil force-pushed the feature/otel-kube-stack branch 7 times, most recently from faaad58 to a386e86 Compare May 19, 2025 14:02

aglarendil force-pushed the feature/otel-kube-stack branch 2 times, most recently from d6ca143 to dce4d33 Compare June 3, 2025 15:39

aglarendil changed the title ~~Switch Metric Collectors to Opentelemetry-kube-stack~~ feat: Switch Metric Collectors to Opentelemetry-kube-stack Jun 3, 2025

aglarendil marked this pull request as ready for review June 3, 2025 15:47

aglarendil requested review from denis-ryzhkov and gmlexx as code owners June 3, 2025 15:47

denis-ryzhkov mentioned this pull request Jun 5, 2025

fix: Temporary adaptation of new alerts to current metrics #310

Merged

aglarendil force-pushed the feature/otel-kube-stack branch from 089bb98 to 344ceea Compare June 9, 2025 15:24

aglarendil force-pushed the feature/otel-kube-stack branch 3 times, most recently from ae9e70d to 580928e Compare June 17, 2025 17:12

aglarendil mentioned this pull request Jun 19, 2025

Add Otel collector for k0s-launched Kube Components #240

Closed

aglarendil force-pushed the feature/otel-kube-stack branch from 580928e to 8bef154 Compare June 23, 2025 12:07

aglarendil requested a review from AndrejsPon00 as a code owner June 23, 2025 12:07

aglarendil force-pushed the feature/otel-kube-stack branch 7 times, most recently from 0af01a1 to 817e0df Compare June 30, 2025 15:40

aglarendil force-pushed the feature/otel-kube-stack branch 2 times, most recently from 8cf9bc2 to 8d857a3 Compare June 30, 2025 17:19

This was referenced Jul 2, 2025

add logs processor for golang produced keyvalue type logs #370

Closed

add dmesg log parser #371

Closed

denis-ryzhkov requested changes Jul 4, 2025

View reviewed changes

denis-ryzhkov mentioned this pull request Jul 4, 2025

feat: dashboards from kube-prom-stack #278

Closed

aglarendil force-pushed the feature/otel-kube-stack branch from 8d857a3 to d1eb7c4 Compare July 7, 2025 12:59

aglarendil requested a review from denis-ryzhkov July 8, 2025 11:49

aglarendil mentioned this pull request Jul 8, 2025

feat: add handler to fetch internal metrics from collectors #387

Merged

denis-ryzhkov requested changes Jul 10, 2025

View reviewed changes

Comment thread charts/kof-istio/templates/kof-regional-cluster-profile.yaml Outdated

Comment thread charts/kof-istio/templates/_helpers.tpl Outdated

Comment thread charts/kof-mothership/values.yaml

aglarendil added 3 commits July 14, 2025 14:04

Add scripts to sync dashboards from upstream Kube-Prometheus-Stack

8ece7f2

sync dashboards from kps

1f0ce6c

aglarendil force-pushed the feature/otel-kube-stack branch from 8a6e09b to 1f0ce6c Compare July 14, 2025 12:04

aglarendil requested a review from denis-ryzhkov July 14, 2025 13:33

aglarendil and others added 7 commits July 15, 2025 18:01

fix clustername processor

c6791f9

Fix default regional cluster namespace to use kof ns

146c9a2

fix: Enabling few upstream recording rules

1b47b53

Revert "fix: Temporary adaptation of new alerts to current metrics (k…

347671d

…0rdent#310)" This reverts commit c244f6e.

disable k0s hostnetworked collector as incompatible with istio

445d689

fix: typo in role name for targetallocator extra role token

45d691f

fix: Adding clusterName when collecting from management cluster

c94b56e

denis-ryzhkov approved these changes Jul 21, 2025

View reviewed changes

gmlexx approved these changes Jul 21, 2025

View reviewed changes

denis-ryzhkov merged commit eb2f283 into k0rdent:main Jul 21, 2025
6 checks passed

github-project-automation Bot moved this to Done in k0rdent Jul 21, 2025

This was referenced Jul 23, 2025

dedup some metrics collection done simulateneously by ta-daemon and daemon #411

Closed

fix opentelemetry collectors metrics collection #412

Closed

This was referenced Jul 24, 2025

Missing metrics in alerts #407

Closed

fix: make dev-collectors-deploy was breaking exporters #444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Switch Metric Collectors to Opentelemetry-kube-stack#273

feat: Switch Metric Collectors to Opentelemetry-kube-stack#273
denis-ryzhkov merged 10 commits into
k0rdent:mainfrom
aglarendil:feature/otel-kube-stack

aglarendil commented May 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aglarendil commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aglarendil commented May 13, 2025 •

edited

Loading