Skip to content

Commit 86762d1

Browse files
jamesmontemagnotarekghmichaelgsharp
authored
.NET 10 P3 - Libraries (#9824)
* .NET 10 P3 - Libraries * Introduce an AOT-Safe Constructor for ValidationContext * Support for Telemetry Schema URLs in ActivitySource and Meter * Byte-Level Support in BPE Tokenizer * ML.NET notes * Tensor update notes --------- Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
1 parent 48000f4 commit 86762d1

1 file changed

Lines changed: 93 additions & 1 deletion

File tree

release-notes/10.0/preview/preview3/libraries.md

Lines changed: 93 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,96 @@ Here's a summary of what's new in .NET Libraries in this preview release:
1010

1111
## Feature
1212

13-
Something about the feature
13+
Something about the feature.
14+
15+
## Introduce an AOT-Safe Constructor for `ValidationContext`
16+
17+
The `ValidationContext` class is used during options validation to provide validation context. Since extracting the `DisplayName` may involve reflection, all existing constructors of `ValidationContext` are currently marked as unsafe for AOT compilation. As a result, using `ValidationContext` in a native application build can generate warnings indicating that the type is unsafe.
18+
19+
To address this, we are introducing a new [`ValidationContext` constructor](https://github.com/dotnet/runtime/issues/113134#issuecomment-2715310131) that explicitly accepts the `displayName` as a parameter. This new constructor ensures AOT safety, allowing developers to use `ValidationContext` in native builds without encountering errors or warnings.
20+
21+
```csharp
22+
namespace System.ComponentModel.DataAnnotations;
23+
24+
public sealed class ValidationContext
25+
{
26+
public ValidationContext(object instance, string displayName, IServiceProvider? serviceProvider = null, IDictionary<object, object?>? items = null)
27+
}
28+
```
29+
## Support for Telemetry Schema URLs in `ActivitySource` and `Meter`
30+
31+
[OpenTelemetry](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/schemas) defines a specification for supporting Telemetry Schemas. To align with this, the `ActivitySource` and `Meter` classes now support specifying a Telemetry Schema URL during construction.
32+
33+
This enhancement allows creators of `ActivitySource` and `Meter` instances to define a schema URL for the tracing and metrics data they produce. Consumers of this data can then process it according to the specified schema, ensuring consistency and compatibility.
34+
35+
Additionally, the update introduces `ActivitySourceOptions`, simplifying the creation of `ActivitySource` instances with multiple configuration options.
36+
37+
```csharp
38+
namespace System.Diagnostics
39+
{
40+
public sealed partial class ActivitySource
41+
{
42+
public ActivitySource(ActivitySourceOptions options);
43+
44+
public string? TelemetrySchemaUrl { get; }
45+
}
46+
47+
public class ActivitySourceOptions
48+
{
49+
public ActivitySourceOptions(string name);
50+
51+
public string Name { get; set; }
52+
public string? Version { get; set; }
53+
public IEnumerable<KeyValuePair<string, object?>>? Tags { get; set; }
54+
public string? TelemetrySchemaUrl { get; set; }
55+
}
56+
}
57+
58+
namespace System.Diagnostics.Metrics
59+
{
60+
public partial class Meter : IDisposable
61+
{
62+
public string? TelemetrySchemaUrl { get; }
63+
}
64+
65+
public partial class MeterOptions
66+
{
67+
public string? TelemetrySchemaUrl { get; set; }
68+
}
69+
}
70+
```
71+
72+
## Byte-Level Support in BPE Tokenizer
73+
74+
The `BpeTokenizer` has been available in the `Microsoft.ML.Tokenizers` library for some time. This [update](https://github.com/dotnet/machinelearning/pull/7425) introduces support for Byte-Level encoding in the BPE tokenizer.
75+
76+
Byte-Level encoding allows the tokenizer to process vocabulary as UTF-8 bytes, transforming certain characters—for example, spaces are represented as `Ġ`. This enhancement enables the creation of tokenizer objects compatible with models that utilize Byte-Level BPE tokenization, such as the [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-R1) model. The [test code](https://github.com/dotnet/machinelearning/blob/1ccbbd4b840e8edc21fcc0fe102e4dfb5ff75eea/test/Microsoft.ML.Tokenizers.Tests/BpeTests.cs#L875) demonstrates how to read a Hugging Face `tokenizer.json` file for DeepSeek and create a corresponding tokenizer object.
77+
78+
Additionally, this update introduces the `BpeOptions` type, making it easier to configure a BPE tokenizer using multiple options. The new factory method `BpeTokenizer.Create(BpeOptions options)` simplifies the instantiation process:
79+
80+
```csharp
81+
BpeOptions bpeOptions = new BpeOptions(vocabs);
82+
BpeTokenizer tokenizer = BpeTokenizer.Create(bpeOptions);
83+
```
84+
85+
## Deterministic option for LightGBM Trainer in ML.NET
86+
87+
LightGBM is one of the most popular trainers in ML.NET. In ML.NET we expose a limited set of options to simplify its use. Unfortunately, this meant you could sometimes get non-deterministic results even with the same data and the same random seed. This is due to how LightGBM does its training.
88+
89+
This [update](https://github.com/dotnet/machinelearning/pull/7415) exposes LightGBM's `deterministic`, `force_row_wise`, and `force_cos_wise` options to allow you to force deterministic training behavior when needed. You can set these options using the appropriate LightGBM options class based on which trainer type you are using.
90+
91+
```csharp
92+
LightGbmBinaryTrainer trainer = ML.BinaryClassification.Trainers.LightGbm(new LightGbmBinaryTrainer.Options
93+
{
94+
NumberOfLeaves = 10,
95+
MinimumExampleCountPerLeaf = 2,
96+
UnbalancedSets = false,
97+
Deterministic = true,
98+
ForceRowWise = true
99+
});
100+
```
101+
102+
# Tensor enhancements
103+
When we initially released Tensor last year we did not provide any non-generic means of interacting with it, even for things that don't really need that generic information such as getting the `Lengths` and `Strides`. This [update](https://github.com/dotnet/runtime/pull/113401) changes the class hierarchy by adding in a non-generic interface that allows you to do those types of operations without needing to worry about generics. It also adds the ability to get/set data in a non-generic why by boxing to type `object`. This does incur a performance penalty and should be avoided when performance is desired, but can make some data access easier when its not required.
104+
105+
When performing `Slice` operations on a `Tensor`, the initial implementation copied the underlying data. This copy could be avoided by using a `TensorSpan` or `ReadOnlyTensorSpan`, but there were many times that same behavior was desired on `Tensor` as well. This [update](https://github.com/dotnet/runtime/pull/113166) adds that behavior. Now, slice operations on a `Tensor` perform the same as the `TensorSpan` types and no longer do a copy.

0 commit comments

Comments
 (0)