feat: Switch Metric Collectors to Opentelemetry-kube-stack#273
Merged
Conversation
faaad58 to
a386e86
Compare
d6ca143 to
dce4d33
Compare
089bb98 to
344ceea
Compare
ae9e70d to
580928e
Compare
580928e to
8bef154
Compare
0af01a1 to
817e0df
Compare
8cf9bc2 to
8d857a3
Compare
This was referenced Jul 2, 2025
Closed
denis-ryzhkov
requested changes
Jul 4, 2025
8d857a3 to
d1eb7c4
Compare
denis-ryzhkov
requested changes
Jul 10, 2025
refer to opentelemetry-kube-stack and use it collect all the metrics
from within the kof-collectors chart
also modify the MCS for kof-child and kof-regional accordingly to refer to the secrets related to basic-auth and also
"parametrise" exporters configuration
we have 4 collectors:
1. kube cluster collectors (cluster-stats)
enable as much as possible for kube collection
2. node daemon collectors
host metrics
additional scrape config for "kubernetes-pods" jobs and kubelet metrics (/cadvisor, /metrics, /metrics/resource, /metrics/probes)
k0s components collector (hostnetwork, polling etcd, kube-controller-manager)
prometheus receiver with a scrape config to poll pods such as kube-controller-manager, scheduler, etcd (launched by k0s)
syslog collector that also extracts contents using Grok patterns as for instance default Ubuntu 24.04 log format forwarded by systemd to rsyslog
is not in any way syslog-rfc-compliant
4. target-allocator collector - dedicated
works only against prometheus objects. ta is enabled indepedently as it modifies how node collectors receive targets to scrape and affects the part
of the scrape config for daemon that uses hacks around env variables such as OTEK_KUBE_NODE_NAME and others. so we separate those 2 daemons for them
not to step on each other's toes
collectors are sprinkled over with attribute transformers and populate node/job/instance and/or their
opentelemetry counterparts (i.e. service.instance.id), so when we use kube-prom-stack dashboards and alerts, we do not have
label discrepancy.
possible known issues with this version:
1. some attributes need to be additionally renamed ('/hostfs' for /var/log/syslog)
2. some collectors are commented out (journald is still alpha, but can be parametrised to be enabled, requires json parsing
and ugly hacks with LD_LIBRARY_PATH - honestly, to be removed in this version)
3. requires additional filtering of redundant and very noise metrics such as some of kubeapi latency buckets, etc.
4. some servicemonitors might collect the same metrics as other collectors (i.e. node-exporter for daemon collector and node-exporter via service monitor
- to be cleaned up as well)
5. otel-operator requires explicit setting for fallbackstrategy to collect non-Node service monitors such as apiserver
6. istiod and other pod-based collection from upstream kof was also deleted as it should be picked up by "kubernetes-pods" job
cleanup: remove journald binary dirty hack and comments
fix: defaultCRConfig values for default helm install
fix: remove kube-state-metrics and prom-node-exporter from subcharts
they are installed as subcharts of opentelemetry-kube-stack
fix: remove unneeded auth exts from regional mcs
fix: add auth exts removed from default config
fix: adopt istio to passing values explicitly to otel-kube-stack
misc: use cluster label instead of clusterName
address reviews
resolve global.cluster vs global.clusterName conundrum
global.clusterName may still be used by a pair of subcharts while
clusterLabel should surely be used for dashboards and metrics/alerts
to mirror the default behaviour in upstream
however, for vmoperator and vlogscluster global.cluster is already
reserved to contain some info in map format, so we just resort to:
clusterLabel - used for dashboards and metrics
clusterName - to have clustername in chart values
fix: fix variable interpolation for istio child template
fix: follow cluster and config changes in corresponding ns
fixup insecure flag, as it is not needed
8a6e09b to
1f0ce6c
Compare
…0rdent#310)" This reverts commit c244f6e.
denis-ryzhkov
approved these changes
Jul 21, 2025
gmlexx
approved these changes
Jul 21, 2025
This was referenced Jul 23, 2025
This was referenced Jul 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #204
refer to opentelemetry-kube-stack and use it collect all the metrics from within the kof-collectors chart
we have 4 collectors:
enable as much as possible for kube collection
host metrics
additional scrape config for "kubernetes-pods" jobs and kubelet metrics (/cadvisor, /metrics, /metrics/resource, /metrics/probes)
k0s components collector (hostnetwork, polling etcd, kube-controller-manager)
prometheus receiver with a scrape config to poll pods such as kube-controller-manager, scheduler, etcd (launched by k0s)
syslog collector that also extracts contents using Grok patterns as for instance default Ubuntu 24.04 log format forwarded by systemd to rsyslog is not in any way syslog-rfc-compliant
works only against prometheus objects. ta is enabled indepedently as it modifies how node collectors receive targets to scrape and affects the part of the scrape config for daemon that uses hacks around env variables such as OTEK_KUBE_NODE_NAME and others. so we separate those 2 daemons for them not to step on each other's toes
collectors are sprinkled over with attribute transformers and populate node/job/instance and/or their opentelemetry counterparts (i.e. service.instance.id), so when we use kube-prom-stack dashboards and alerts, we do not have label discrepancy.
possible known issues with this version:
P.S. otel-kube-stack directory is a sandbox playground to be removed in a consequent commit bit later