For p/invokes, the character set is encoded into the metadata for the method. As a result, adding anything, like UTF-8, is complex and far-reaching. The current experience (ANSI means UTF-8 on Unix) is odd and confusing. The p/invoke source generator should be used to improve this experience.
We’d like to:
- Avoid proliferating the pattern of ‘use
CharSet.Ansi on Unix to get UTF-8
- Allow specifying the character set to use for all parameters a method
- Instead of needing
MarshalAs on each parameter
- Avoid adding to the
CharSet enumeration
- Don’t want inconsistent support and don’t want to implement new support in all the places that currently use it
Our current thinking is to:
- Remove
CharSet field
- Add
MarshalStringsUsing field - Type
Example:
// UTF-8 - equivalent to explicitly specifying [MarshalAs(UnmanagedType.LPUTF8Str)] on string parameters
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf8StringMarshalling))]
public static partial int Method(string s);
// UTF-16 - equivalent to CharSet.Unicode behaviour in built-in
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf16StringMarshalling))]
public static partial int Method(string s);
// Error - invalid encoding
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(int))]
public static partial int Method(string s);
// User-defined marshalling
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(MyCustomMarshal.Wtf8String))]
public static partial int Method(string s);
Where:
// .NET can provide:
namespace System.Runtime.InteropServices.Encoding
{
// UTF-16 with endianness based on the current platform
struct Utf16StringMarshalling { ... }
// UTF-8
struct Utf8StringMarshalling { ... }
// ANSI
[SupportedOSPlatform("windows")]
struct AnsiStringMarshalling { ... }
...
}
// User can define:
namespace MyCustomMarshal
{
struct Wtf8String { ... }
}
Other considerations:
- Naming:
Unicode vs Utf16
- .NET has usually used the (Windows-centric) term Unicode to refer to UTF-16. Naming the struct
...Utf16StringMarshalling would be correct and in line with our cross-platform focus, but UnicodeStringMarshalling would be more consistent with existing APIs.
- Auto (UTF-8 on Unix, UTF-16 on Windows);
- We expect usage to be low. If necessary, users can define different p/invokes and call the desired one conditionally (for example, using the
OperatingSystem APIs)
- Defaults:
- The source generator requires specifying marshalling information for string/char.
- Requires the intention to be made clear and removes hidden assumptions, but can make declarations more verbose
- The source generator does not check / reconcile higher level settings like
DefaultCharSetAttribute.
ExactSpelling: uses CharSet to probe for entry point on Windows, doesn’t mean anything on Unix
- The source generator could require exact spelling for entry point names
- Would be in the spirit of avoiding propagating some of the Windows-centric aspects of DllImport
@AaronRobinsonMSFT @jkoritzinsky @jkotas @stephentoub
For p/invokes, the character set is encoded into the metadata for the method. As a result, adding anything, like UTF-8, is complex and far-reaching. The current experience (ANSI means UTF-8 on Unix) is odd and confusing. The p/invoke source generator should be used to improve this experience.
We’d like to:
CharSet.Ansion Unix to get UTF-8MarshalAson each parameterCharSetenumerationOur current thinking is to:
CharSetfieldMarshalStringsUsingfield -TypeMarshalUsing/NativeMarshallingattributes for custom marshalling of stringsSystem.Text.Encodingunder the hood)Example:
Where:
Other considerations:
UnicodevsUtf16...Utf16StringMarshallingwould be correct and in line with our cross-platform focus, butUnicodeStringMarshallingwould be more consistent with existing APIs.OperatingSystemAPIs)DefaultCharSetAttribute.ExactSpelling: usesCharSetto probe for entry point on Windows, doesn’t mean anything on Unix@AaronRobinsonMSFT @jkoritzinsky @jkotas @stephentoub