Skip to content

fix(telemetry): improve vm/container detection#10944

Merged
lidel merged 5 commits intomasterfrom
telemetry/improve-vm-detection
Sep 8, 2025
Merged

fix(telemetry): improve vm/container detection#10944
lidel merged 5 commits intomasterfrom
telemetry/improve-vm-detection

Conversation

@hsanjuan
Copy link
Contributor

Current VM detection is not very accurate and systemd-detect-virt does exactly what's needed under a miriad of virtualization platforms.

The downside is that we are running a system command which is uglier and might perhaps flip anti-viruses or something.

Current VM detection is not very accurate and systemd-detect-virt does exactly
what's needed under a miriad of virtualization platforms.

The downside is that we are running a system command which is uglier and might
perhaps flip anti-viruses or something.
@hsanjuan hsanjuan self-assigned this Aug 28, 2025
@hsanjuan hsanjuan requested a review from a team as a code owner August 28, 2025 10:52
@hsanjuan hsanjuan added the skip/changelog This change does NOT require a changelog entry label Aug 28, 2025
Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the command output is not being captured, you probably want to use the --quiet flag.

hsanjuan and others added 2 commits August 29, 2025 08:45
Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
@lidel lidel assigned lidel and unassigned hsanjuan Aug 29, 2025
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. i think we need different approach, this is too hairy:

  1. PATH injection risk - attacker could place malicious systemd-detect-virt in PATH -- not saying this is real, but all systems / audits / hardenign tools will scream, making it harder for engineers to use kubo in their infra.
  2. No timeout - command could hang indefinitely
  3. Unnecessary repeated execution - virtualization status won't change during runtime, so we should only check it once and cache result

Give me a sec, I have a stashed branch with go-only checks, need to refresh my memory, maybe we can avoid calling external binary.

replace systemd-detect-virt with file-based detection to avoid:
- security risks from executing external binaries
- unnecessary repeated detection (now cached with sync.Once)
- missing detection on non-systemd systems

removes false positives:
- cpu hypervisor flag (indicates capability, not guest status)
- generic dmi strings that match physical hardware
- overlay filesystem check (used by immutable distros)
@lidel lidel mentioned this pull request Aug 29, 2025
49 tasks
@aschmahmann
Copy link
Contributor

Give me a sec, I have a stashed branch with go-only checks, need to refresh my memory, maybe we can avoid calling external binary.

@lidel in case it's of use I see that this library exists github.com/ShellCode33/VM-Detection. It doesn't seem maintained (from both the last commit and the issues), so you might be better off just writing / porting the basic checks from C, but wanted to flag in case it's helpful.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann thx, yes, my changes from d77f130 are partially based on that prior art. Unfortunately that library is bit too native and produces false-positives. We skipped some... creative 🙃 things to avoid false-pisitives, for example

  • Microsoft Surface laptops being interpreted as VMs ;-)
  • CPUID VM flag - detects CPU capability to run VMs, not whether we're in one (causes false positives on modern hardware)
  • MAC address prefixes - can be spoofed or match USB adapters from those vendors
  • Low resource detection (<3 CPUs, <3GB RAM) - would incorrectly flag cheap RPi/old/embedded hardware as VMs
  • Reading ALL files in /sys/class/dmi/id/ - too broad, could match vendor names in unexpected places etc

I think the current approach in this PR is the right balance, focusing on zero false positives over maximum detection coverage (bad telemetry data is worse than missing data).

@hsanjuan @gammazero mind taking a look? (ok to merge from my end)

@lidel lidel changed the title telemetry: use systemd-detect-virt for container/vm detection fix(telemetry): improve vm/container detection Aug 29, 2025
if err == nil {
for line := range strings.Lines(string(content)) {
if strings.Contains(line, "overlay") && strings.Contains(line, "/var/lib/containers/storage/overlay") {
// WSL is technically a container-like environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but does this help us to bundle with other virtualization tools? Maybe yes from the perspective of "networking is harder / easier to mess up", but not really if we're using as a proxy for say users who tend to control their environments programmatically.

Not strictly advocating one way or the other here, but would be helpful to know (and document in case we need to revisit down the line) what the idea is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't have strong feeling here, but I think if something fails in WSL is would be because of container-like nature.

I'm thinking this way: if we see networking issues in WSL, they should correlate with "Window+container" cohort, rather regular Windows users who run IPFS Desktop, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably. Although IIUC the comparison is not to windows users but to traditional Linux users using CLI or desktop, since the only information being tracked is that it's Linux and a container (vs just Linux)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as it is, but we may want to consider adding some string to our telemetry data to indicate just what was detected: "linux+wsl"

Copy link
Contributor Author

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much safer approach than exec.Command.

See question about caching the answer within the check, as is done in this PR. Or whether telemetry should have knowledge about which checks to only do once.

if err == nil {
for line := range strings.Lines(string(content)) {
if strings.Contains(line, "overlay") && strings.Contains(line, "/var/lib/containers/storage/overlay") {
// WSL is technically a container-like environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as it is, but we may want to consider adding some string to our telemetry data to indicate just what was detected: "linux+wsl"


func isRunningInContainer() bool {
// Check for Docker container
containerDetectionOnce.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to cache the result here (if does not hurt to do so), or do we want to cache at a higher level remembering that isRunningInContainer and isRunningInVM have already been called?

Previously, I had assumed that container/VM checks would be cached at a higher level, but caching the result within in the check here says that was a wrong assumption. It seems like telemetry should know which questions it only needs to ask once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this is an extra precaution for now. we will likely refine metrics over time, and have more clear split between dynamic and one-time ones, but thats out of scope here (merging as is)

More memory-efficient as it processes one line at a time instead
of creating a slice of all lines upfront.

Ref: #10944 (comment)

Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
@lidel lidel merged commit 049256c into master Sep 8, 2025
16 checks passed
@lidel lidel deleted the telemetry/improve-vm-detection branch September 8, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip/changelog This change does NOT require a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants