fix(telemetry): improve vm/container detection#10944
Conversation
Current VM detection is not very accurate and systemd-detect-virt does exactly what's needed under a miriad of virtualization platforms. The downside is that we are running a system command which is uglier and might perhaps flip anti-viruses or something.
gammazero
left a comment
There was a problem hiding this comment.
Since the command output is not being captured, you probably want to use the --quiet flag.
Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
There was a problem hiding this comment.
Hm.. i think we need different approach, this is too hairy:
PATHinjection risk - attacker could place malicious systemd-detect-virt inPATH-- not saying this is real, but all systems / audits / hardenign tools will scream, making it harder for engineers to use kubo in their infra.- No timeout - command could hang indefinitely
- Unnecessary repeated execution - virtualization status won't change during runtime, so we should only check it once and cache result
Give me a sec, I have a stashed branch with go-only checks, need to refresh my memory, maybe we can avoid calling external binary.
replace systemd-detect-virt with file-based detection to avoid: - security risks from executing external binaries - unnecessary repeated detection (now cached with sync.Once) - missing detection on non-systemd systems removes false positives: - cpu hypervisor flag (indicates capability, not guest status) - generic dmi strings that match physical hardware - overlay filesystem check (used by immutable distros)
@lidel in case it's of use I see that this library exists github.com/ShellCode33/VM-Detection. It doesn't seem maintained (from both the last commit and the issues), so you might be better off just writing / porting the basic checks from C, but wanted to flag in case it's helpful. |
There was a problem hiding this comment.
@aschmahmann thx, yes, my changes from d77f130 are partially based on that prior art. Unfortunately that library is bit too native and produces false-positives. We skipped some... creative 🙃 things to avoid false-pisitives, for example
- Microsoft Surface laptops being interpreted as VMs ;-)
- CPUID VM flag - detects CPU capability to run VMs, not whether we're in one (causes false positives on modern hardware)
- MAC address prefixes - can be spoofed or match USB adapters from those vendors
- Low resource detection (<3 CPUs, <3GB RAM) - would incorrectly flag cheap RPi/old/embedded hardware as VMs
- Reading ALL files in /sys/class/dmi/id/ - too broad, could match vendor names in unexpected places etc
I think the current approach in this PR is the right balance, focusing on zero false positives over maximum detection coverage (bad telemetry data is worse than missing data).
@hsanjuan @gammazero mind taking a look? (ok to merge from my end)
| if err == nil { | ||
| for line := range strings.Lines(string(content)) { | ||
| if strings.Contains(line, "overlay") && strings.Contains(line, "/var/lib/containers/storage/overlay") { | ||
| // WSL is technically a container-like environment |
There was a problem hiding this comment.
True, but does this help us to bundle with other virtualization tools? Maybe yes from the perspective of "networking is harder / easier to mess up", but not really if we're using as a proxy for say users who tend to control their environments programmatically.
Not strictly advocating one way or the other here, but would be helpful to know (and document in case we need to revisit down the line) what the idea is.
There was a problem hiding this comment.
I also don't have strong feeling here, but I think if something fails in WSL is would be because of container-like nature.
I'm thinking this way: if we see networking issues in WSL, they should correlate with "Window+container" cohort, rather regular Windows users who run IPFS Desktop, no?
There was a problem hiding this comment.
Yeah probably. Although IIUC the comparison is not to windows users but to traditional Linux users using CLI or desktop, since the only information being tracked is that it's Linux and a container (vs just Linux)
There was a problem hiding this comment.
This is fine as it is, but we may want to consider adding some string to our telemetry data to indicate just what was detected: "linux+wsl"
gammazero
left a comment
There was a problem hiding this comment.
This is a much safer approach than exec.Command.
See question about caching the answer within the check, as is done in this PR. Or whether telemetry should have knowledge about which checks to only do once.
| if err == nil { | ||
| for line := range strings.Lines(string(content)) { | ||
| if strings.Contains(line, "overlay") && strings.Contains(line, "/var/lib/containers/storage/overlay") { | ||
| // WSL is technically a container-like environment |
There was a problem hiding this comment.
This is fine as it is, but we may want to consider adding some string to our telemetry data to indicate just what was detected: "linux+wsl"
|
|
||
| func isRunningInContainer() bool { | ||
| // Check for Docker container | ||
| containerDetectionOnce.Do(func() { |
There was a problem hiding this comment.
Do we want to cache the result here (if does not hurt to do so), or do we want to cache at a higher level remembering that isRunningInContainer and isRunningInVM have already been called?
Previously, I had assumed that container/VM checks would be cached at a higher level, but caching the result within in the check here says that was a wrong assumption. It seems like telemetry should know which questions it only needs to ask once.
There was a problem hiding this comment.
i guess this is an extra precaution for now. we will likely refine metrics over time, and have more clear split between dynamic and one-time ones, but thats out of scope here (merging as is)
More memory-efficient as it processes one line at a time instead of creating a slice of all lines upfront. Ref: #10944 (comment) Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
Current VM detection is not very accurate and systemd-detect-virt does exactly what's needed under a miriad of virtualization platforms.
The downside is that we are running a system command which is uglier and might perhaps flip anti-viruses or something.