Fix #91958: use mach_timebase_info to determine process time coefficient on macOS#92185
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-diagnostics-process Issue Detailsnull
|
src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs
Outdated
Show resolved
Hide resolved
7b79541 to
7571e06
Compare
adamsitnik
left a comment
There was a problem hiding this comment.
Overall it looks good, but I found few places that could be polished a bit.
Thank you for your contribution @ForNeVeR !
There was a problem hiding this comment.
@jkoritzinsky @AaronRobinsonMSFT In terms of marshaller best practices, do we need to explicitly specify sequential layout for such structs?
| public struct mach_timebase_info_data_t | |
| [StructLayout(LayoutKind.Sequential)] | |
| public struct mach_timebase_info_data_t |
There was a problem hiding this comment.
Technically, no. Value types in .NET default to sequential layout. However, and this is annoying, there are some Roslyn warnings that are suppressed if one does explicitly mark the type with sequential layout. The reasoning here is historical, but the gist is if Roslyn complains about unreferenced fields, which can happen for types used in interop, then placing StructLayout(LayoutKind.Sequential) on the type will automatically suppress the warning.
The interop team's general guidance here has been to accept the defaults except where there is annoying friction with C# or where the tooling requires explicit details. This falls into the C# friction bucket, but only if a warning is emitted.
There was a problem hiding this comment.
@AaronRobinsonMSFT thank you for a very detailed answer!
There was a problem hiding this comment.
I can't see any related warnings in the compilation (and also we seem to have warnings as errors enabled in this part?).
Does this mean this attribute is unnecessary? I am totally okay with adding that if required. Though, yeah, we all know that sequential is the default struct layout 😅
There was a problem hiding this comment.
Does this mean this attribute is unnecessary?
Yes.
There was a problem hiding this comment.
It would be better to initialize this field lazily, when we need it for the first time. Otherwise this sys-call:
- may be called even if we don't need it (during type initializaiton)
- in theory it may fail and be very hard to handle properly
Since we use the following logic in 4 places:
return new TimeSpan(Convert.ToInt64($ulong * timeBase.numer / timeBase.denom / NanosecondsTo100NanosecondsFactor));we could introduce a helper method that could take care of everything, something like this:
| private static readonly Interop.libSystem.mach_timebase_info_data_t timeBase = GetTimeBase(); | |
| private static volatile uint s_timeBase_numer, s_timeBase_denom; | |
| private static TimeSpan Map(ulong sysTime) | |
| { | |
| uint denom = s_timeBase_denom; | |
| if (denom == default) | |
| { | |
| Interop.libSystem.mach_timebase_info_data_t timeBase = GetTimeBase(); | |
| s_timeBase_denom = denom = timeBase.denom; | |
| s_timeBase_numer = timeBase.numer; | |
| } | |
| uint numer = s_timeBase_numer; | |
| return new TimeSpan(Convert.ToInt64(sysTime * numer / denom / NanosecondsTo100NanosecondsFactor)); | |
| } |
There was a problem hiding this comment.
Yeah, agree with your reasoning and applied a bit modified version of this snippet (the only change is in naming, I like MapTime a little bit better).
There was a problem hiding this comment.
thank you for finding and updating all use cases! 👍
src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs
Outdated
Show resolved
Hide resolved
|
Build on CI is now failing due to something being wrong with |
There was a problem hiding this comment.
We have two options:
-
Include the two keys in test resources:
<data name="CantGetAllPids" xml:space="preserve"> <value>Could not get all running Process IDs.</value> </data> <data name="RUsageFailure" xml:space="preserve"> <value>Failed to set or retrieve rusage information. See the error code for OS-specific error information.</value> </data>
in:
and delete this hard-coded line. -
Include test resources alongside source ones -- separated by semicolon
;:<StringResourcesPath>$(MSBuildProjectDirectory)\Resources\Strings.resx;$(MSBuildProjectDirectory)\..\src\Resources\Strings.resx</StringResourcesPath>
multiple resources seem to be supported.
Option 1 is probably much cleaner that avoids mixing stuff, but I'll defer to others. cc @ViktorHofer
There was a problem hiding this comment.
Agreed. Option 1 is cleaner while it adds some duplication.
There was a problem hiding this comment.
Should it do the last division first, like here?
https://chromium.googlesource.com/chromium/src/+/refs/tags/58.0.3029.141/base/time/time_mac.cc#43
(I don't know what values it typically returns)
There was a problem hiding this comment.
Oh, this says it gets the 42 from 125/3. Does 125 * ticks go over ulong max?
There was a problem hiding this comment.
Also I suppose you can store 1000 * denom
I'm sure someone will point out that won't necessarily give the exact same result @tannergooding 🙂
There was a problem hiding this comment.
That's a good point, but I was concerned about losing some precision due to doing division first. Though I can't wrap my head around it right now. Will try to do more thorough analysis later today.
There was a problem hiding this comment.
Ok, here are my thoughts.
Typical tick values are pretty large actually. For comparison, let's consider a program that was working on a single-core CPU for a year.
> [TimeSpan]::FromDays(365).Ticks
315360000000000
> [long]::MaxValue
9223372036854775807
(here I consider long since TimeSpan takes long in its ctor, not ulong)
This means we will overflow on a numer of 29248 (or if our tick value will get significantly bigger, i.e. we take 29000 years into account, or a CPU with 29k cores).
This is very far from being realistic, but it is also much closer than I expected. So, to be on the safe side, let's do the same as Chromium does: divide by our factor first, to get a hundred times more space, by losing some precision.
We may, of course, also do value / (denom * 100) * numer which is relatively the same, yet will lose a bit more precision as well.
There was a problem hiding this comment.
I'm also OK with asserting that numer is reasonably small ( say < 1000) as an alternative, since we suspect that will always be true. Or assert that the math comes out almost the same in 128 bits.
If not we should try to quantify the rounding error and how much it matters.
There was a problem hiding this comment.
My current suggestion is to leave this as sysTime / NanosecondsTo100NanosecondsFactor * numer / denom.
I am not sure how useful an assertion would be in this code. Perhaps it'd be better to add checked, since we are concerned by overflow? We do not expect this code to be too performance sensitive, so checked might be good.
What do you think?
There was a problem hiding this comment.
Not my area, but I"m not sure an exception would be an improvement. I'd be interested in thoughts of a numeric expert like @tannergooding . I'll step aside for area owners to sign off overall...
There was a problem hiding this comment.
I would say that an explicit exception might be an improvement (especially since it would be thrown from a member property such as TotalProcessorTime and not on class init).
But of course, I am ready to listen to other suggestions.
861ef04 to
d81a801
Compare
…efficient on macOS
080428d to
1a57499
Compare
1a57499 to
ce50c65
Compare
|
I believe that all the existing feedback is addressed, so I'm waiting for further feedback. Thank you so much for your time, folks. |
|
@dotnet/area-system-diagnostics-process please review |
|
Ping @dotnet/area-system-diagnostics-process |
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM, thank you for your fix @ForNeVeR !
And apologies for the delay (I was on a parental leave).
| <data name="Argv_IncludeDoubleQuote" xml:space="preserve"> | ||
| <value>The argv[0] argument cannot include a double quote.</value> | ||
| </data> | ||
| <data name="CantGetAllPids" xml:space="preserve"> |
There was a problem hiding this comment.
The fix I've suggested in #92185 (comment) should just work, but I can take care of that in a separate PR to get the fix merged right now.
|
Thank you so much for help! ❤️ |
|
/backport to release/8.0-staging |
|
Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/8385573189 |
|
Backporting this fix to 8.0 since this was reported again in #98121. |
|
@jeffhandley backporting to release/8.0-staging failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Fix #91958: use mach_timebase_info to determine process time coefficient on macOS
Using index info to reconstruct a base tree...
M src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
CONFLICT (content): Merge conflict in src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix #91958: use mach_timebase_info to determine process time coefficient on macOS
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128Please backport manually! |
|
@jeffhandley an error occurred while backporting to release/8.0-staging, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
Closes #91958.
I have compared the results of the process timing functions to the results of
pson my M2 MacBook device, and they seem to work properly after the changes.The new tests were properly failing before the change (since we were returning values 42 times lower than the native results), and are, of course, green after the changes.