Skip to content

Add telemetry to categorize build failure reasons#13007

Merged
YuliiaKovalova merged 13 commits intomainfrom
copilot/add-telemetry-for-build-failures
Jan 13, 2026
Merged

Add telemetry to categorize build failure reasons#13007
YuliiaKovalova merged 13 commits intomainfrom
copilot/add-telemetry-for-build-failures

Conversation

Copy link
Contributor

Copilot AI commented Jan 12, 2026

Work item (Internal use):

Summary

MSBuild telemetry tracks build success/failure but lacks granularity on why builds fail. This adds error categorization to enable failure pattern analysis.

Changes:

  • BuildTelemetry: Added FailureCategory (primary) and error count properties per category
  • BuildErrorTelemetryTracker: New dedicated class that encapsulates error tracking and categorization logic using lock-free Interlocked operations for thread safety, improving separation of concerns and performance
  • LoggingService: Uses BuildErrorTelemetryTracker instance to delegate telemetry operations
  • Categorization: 8 categories based on error code patterns:
    • Compiler: CS*, VBC*, FS* (via code or subcategory field)
    • MSBuildEngine: MSB4001-MSB4999 (except MSB4236)
    • Tasks: MSB3001-MSB3999
    • SDKResolvers: MSB4236 (SDK resolution errors)
    • NETSDK: NETSDK* (.NET SDK diagnostics)
    • NuGet: NU*
    • BuildCheck: BC*
    • Other: uncategorized errors

Data available in both VS telemetry (GetActivityProperties) and SDK aggregated telemetry (GetProperties). Error count properties use namespaced keys (e.g., errorCounts.compiler, errorCounts.netsdk) for better Kusto query ergonomics, enabling easy bucketing of all error counts together.

Customer Impact

Enables data-driven prioritization of failure root causes. Telemetry consumers can identify which error categories affect users most frequently. The distinction between SDK resolvers and .NET SDK diagnostics provides more actionable insights for SDK-related failures. Namespaced property keys improve query performance and usability in telemetry analysis tools.

Regression?

  • Yes
  • No

New properties are only populated when builds fail. No changes to success path or existing telemetry data. Telemetry logic is isolated in a dedicated tracker class for better maintainability. Lock-free implementation using Interlocked operations ensures thread safety without blocking message processing.

Testing

  • 16 unit tests for categorization logic (all error code patterns including new NETSDK and SDKResolvers categories)
  • 7 unit tests for telemetry property serialization with namespaced keys
  • Manual validation: built failing project, verified CS0103 categorized as Compiler
  • All tests pass after refactoring to separate tracker class and implementing lock-free operations

Risk

Low. Additive telemetry change with no impact on build behavior or existing telemetry schema. Refactoring to dedicated tracker class with lock-free Interlocked operations improves both code maintainability and performance. Property namespacing is backward-compatible as it only affects the property keys in the telemetry dictionary, not the underlying data structure.

Original prompt

This section details on the original issue you should resolve

<issue_title>Add telemetry to differentiate build failure reasons</issue_title>
<issue_description>Currently, MSBuild's BuildTelemetry class tracks general build success/failure via the BuildSuccess boolean property, but does not provide granularity on why a build failed. This makes it difficult to analyze failure patterns and improve the developer experience.

Add telemetry data to categorize build failure reasons when BuildSuccess = false. The failure categories should include at minimum:

Image

Success Criteria

When a build fails, telemetry should report the primary failure category
Error counts by category should be included
The first error code encountered could optionally be included for debugging
Data should be available in both VS telemetry and SDK aggregated telemetry</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@dotnet-policy-service
Copy link
Contributor

Hello @@copilot, I noticed that you’re changing an .swr file or any file under src/Package/MSBuild.VSSetup.. Please make sure to validate this change by an experimental VS insertion. This is accomplished by pushing to an exp/* branch, which requires write permissions to this repo.

Copilot AI and others added 2 commits January 12, 2026 16:09
- Added failure category properties to BuildTelemetry class
- Implemented error tracking and categorization in LoggingService
- Added PopulateBuildTelemetryWithErrors method to populate telemetry data
- Created comprehensive tests for error categorization
- Updated documentation with new telemetry data points

The telemetry now tracks:
- FailureCategory: Primary category of the first/most common error
- Error counts by category (Compiler, MSBuildEngine, Tasks, SDK, NuGet, BuildCheck, Other)
- FirstErrorCode: The first error code encountered

Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
- Changed collection expression to explicit constructor for compatibility
- Added constant for MSB code minimum length
- Added documentation comment for magic number

Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Copilot AI changed the title [WIP] Add telemetry to categorize build failure reasons Add telemetry to categorize build failure reasons Jan 12, 2026
Copilot AI requested a review from YuliiaKovalova January 12, 2026 16:16
Created BuildErrorTelemetryTracker class to handle error tracking and categorization,
improving separation of concerns. LoggingService now delegates to this tracker instead
of managing telemetry state directly.

Changes:
- Created BuildErrorTelemetryTracker.cs with all error categorization logic
- Updated LoggingService to use tracker instance
- Added BuildErrorTelemetryTracker to Microsoft.Build.csproj

Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
@YuliiaKovalova YuliiaKovalova marked this pull request as ready for review January 12, 2026 16:45
Copilot AI review requested due to automatic review settings January 12, 2026 16:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds telemetry to categorize build failure reasons, enabling data-driven analysis of why builds fail. The implementation introduces a new BuildErrorTelemetryTracker class that tracks errors across seven categories (Compiler, MSBuildEngine, Tasks, SDK, NuGet, BuildCheck, Other) and populates telemetry properties with error counts and the primary failure category.

Changes:

  • Added error categorization properties to BuildTelemetry (FailureCategory and 7 error count properties)
  • Created BuildErrorTelemetryTracker class to encapsulate error tracking logic with optimized categorization
  • Integrated error tracking into LoggingService to track errors as they're logged
  • Updated BuildManager to populate telemetry with error data on build failure
  • Added comprehensive test coverage for all error categories and telemetry serialization
  • Updated telemetry documentation

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Framework/Telemetry/BuildTelemetry.cs Added failure category and error count properties; updated telemetry serialization methods
src/Build/Microsoft.Build.csproj Added BuildErrorTelemetryTracker.cs to project compilation
src/Build/BackEnd/Components/Logging/BuildErrorTelemetryTracker.cs New class implementing error categorization logic with performance optimizations
src/Build/BackEnd/Components/Logging/LoggingService.cs Integrated error tracking and added method to populate telemetry
src/Build/BackEnd/Components/Logging/ILoggingService.cs Added PopulateBuildTelemetryWithErrors to interface
src/Build/BackEnd/BuildManager/BuildManager.cs Added call to populate error telemetry on build failure
src/Build.UnitTests/BackEnd/MockLoggingService.cs Added empty mock implementation of new interface method
src/Build.UnitTests/BackEnd/KnownTelemetry_Tests.cs Added tests for telemetry serialization of failure data
src/Build.UnitTests/BackEnd/BuildTelemetryErrorCategorization_Tests.cs New comprehensive test suite for error categorization
documentation/wiki/CollectedTelemetry.md Added documentation for primary failure category property

YuliiaKovalova and others added 3 commits January 12, 2026 17:54
…r.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…r.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI and others added 2 commits January 12, 2026 17:22
- Replaced locks with Interlocked.Increment for thread-safe error tracking
- Added namespacing to error count properties (errorCounts.compiler, etc.) for better Kusto query ergonomics
- Updated tests to use namespaced property names

Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Split SDK category into two separate categories:
- SDKResolvers: MSB4236 (SDK resolution errors)
- NETSDK: NETSDK* (.NET SDK diagnostics)

Updated BuildTelemetry properties and tests accordingly.

Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Copilot AI requested a review from YuliiaKovalova January 12, 2026 17:30
This was referenced Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Telemetry Candidates benefiting from telemtry data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add telemetry to differentiate build failure reasons

5 participants