You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* .NET 10 P3 - Libraries
* Introduce an AOT-Safe Constructor for ValidationContext
* Support for Telemetry Schema URLs in ActivitySource and Meter
* Byte-Level Support in BPE Tokenizer
* ML.NET notes
* Tensor update notes
---------
Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Copy file name to clipboardExpand all lines: release-notes/10.0/preview/preview3/libraries.md
+93-1Lines changed: 93 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,4 +10,96 @@ Here's a summary of what's new in .NET Libraries in this preview release:
10
10
11
11
## Feature
12
12
13
-
Something about the feature
13
+
Something about the feature.
14
+
15
+
## Introduce an AOT-Safe Constructor for `ValidationContext`
16
+
17
+
The `ValidationContext` class is used during options validation to provide validation context. Since extracting the `DisplayName` may involve reflection, all existing constructors of `ValidationContext` are currently marked as unsafe for AOT compilation. As a result, using `ValidationContext` in a native application build can generate warnings indicating that the type is unsafe.
18
+
19
+
To address this, we are introducing a new [`ValidationContext` constructor](https://github.com/dotnet/runtime/issues/113134#issuecomment-2715310131) that explicitly accepts the `displayName` as a parameter. This new constructor ensures AOT safety, allowing developers to use `ValidationContext` in native builds without encountering errors or warnings.
## Support for Telemetry Schema URLs in `ActivitySource` and `Meter`
30
+
31
+
[OpenTelemetry](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/schemas) defines a specification for supporting Telemetry Schemas. To align with this, the `ActivitySource` and `Meter` classes now support specifying a Telemetry Schema URL during construction.
32
+
33
+
This enhancement allows creators of `ActivitySource` and `Meter` instances to define a schema URL for the tracing and metrics data they produce. Consumers of this data can then process it according to the specifiedschema, ensuringconsistencyandcompatibility.
The `BpeTokenizer` has been available in the `Microsoft.ML.Tokenizers` library for some time. This [update](https://github.com/dotnet/machinelearning/pull/7425) introduces support for Byte-Level encoding in the BPE tokenizer.
75
+
76
+
Byte-Level encoding allows the tokenizer to process vocabulary as UTF-8 bytes, transforming certain characters—for example, spaces are represented as `Ġ`. This enhancement enables the creation of tokenizer objects compatible with models that utilize Byte-Level BPE tokenization, such as the [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-R1) model. The [test code](https://github.com/dotnet/machinelearning/blob/1ccbbd4b840e8edc21fcc0fe102e4dfb5ff75eea/test/Microsoft.ML.Tokenizers.Tests/BpeTests.cs#L875) demonstrates how to read a Hugging Face `tokenizer.json` file for DeepSeek and create a corresponding tokenizer object.
77
+
78
+
Additionally, this update introduces the `BpeOptions` type, making it easier to configure a BPE tokenizer using multiple options. The new factory method `BpeTokenizer.Create(BpeOptions options)` simplifies the instantiation process:
## Deterministic option for LightGBM Trainer in ML.NET
86
+
87
+
LightGBM is one of the most popular trainers in ML.NET. In ML.NET we expose a limited set of options to simplify its use. Unfortunately, this meant you could sometimes get non-deterministic results even with the same data and the same random seed. This is due to how LightGBM does its training.
88
+
89
+
This [update](https://github.com/dotnet/machinelearning/pull/7415) exposes LightGBM's `deterministic`, `force_row_wise`, and `force_cos_wise` options to allow you to force deterministic training behavior when needed. You can set these options using the appropriate LightGBM options class based on which trainer type you are using.
When we initially released Tensor last year we did not provide any non-generic means of interacting with it, even for things that don't really need that generic information such as getting the `Lengths` and `Strides`. This [update](https://github.com/dotnet/runtime/pull/113401) changes the class hierarchy by adding in a non-generic interface that allows you to do those types of operations without needing to worry about generics. It also adds the ability to get/set data in a non-generic why by boxing to type `object`. This does incur a performance penalty and should be avoided when performance is desired, but can make some data access easier when its not required.
104
+
105
+
When performing `Slice` operations on a `Tensor`, the initial implementation copied the underlying data. This copy could be avoided by using a `TensorSpan` or `ReadOnlyTensorSpan`, but there were many times that same behavior was desired on `Tensor` as well. This [update](https://github.com/dotnet/runtime/pull/113166) adds that behavior. Now, slice operations on a `Tensor` perform the same as the `TensorSpan` types and no longer do a copy.
0 commit comments