LanguageTags is a C# .NET library for handling ISO 639-2, ISO 639-3, and RFC 5646 / BCP 47 language tags. The project serves two primary purposes:
- Data Publishing: Provides ISO 639-2, ISO 639-3, and RFC 5646 language tag records in JSON and C# formats
- Tag Processing: Implements IETF BCP 47 language tag construction and parsing per RFC 5646 semantic rules
Current Version: 1.2 (supports .NET 10.0, AOT compatible)
Important Note: The implemented language tag parsing and normalization logic may be incomplete or inaccurate per RFC 5646. Always verify results for your specific use case
-
LanguageTags (
LanguageTags/LanguageTags.csproj)- Core library project
- NuGet package:
ptr727.LanguageTags - Contains language tag data models, parser, builder, and lookup functionality
- Target framework: .NET 10.0
- C# language version: 14.0
-
LanguageTagsCreate (
LanguageTagsCreate/LanguageTagsCreate.csproj)- CLI utility for downloading and generating language data
- Downloads data from official sources (Library of Congress, SIL, IANA)
- Converts to JSON and generates C# code files
- Target framework: .NET 10.0
-
LanguageTagsTests (
LanguageTagsTests/LanguageTagsTests.csproj)- xUnit test suite with comprehensive coverage
- Uses AwesomeAssertions for test assertions
- Target framework: .NET 10.0
-
LanguageData/
- Contains downloaded language data files
- JSON converted data files
- Updated weekly via GitHub Actions
-
.github/workflows/
run-periodic-codegen-pull-request.yml: Weekly scheduled job to update language datapublish-release.yml: Release and NuGet publishing workflowmerge-bot-pull-request.yml: Automated PR merge workflowbuild-release-task.yml,build-library-task.yml: Build tasksget-version-task.yml,build-datebadge-task.yml: Version and badge generation
The main public API for working with language tags:
Static Factory Methods:
Parse(string tag): Parse a language tag string, returns null on failureTryParse(string tag, out LanguageTag? result): Safe parsing with out parameterParseOrDefault(string tag, LanguageTag? defaultTag = null): Parse with fallback to "und"ParseAndNormalize(string tag): Parse and normalize in one stepCreateBuilder(): Create a fluent builder instanceFromLanguage(string language): Factory for simple language tagsFromLanguageRegion(string language, string region): Factory for language+region tagsFromLanguageScriptRegion(string language, string script, string region): Factory for full tags
Properties:
Language: Primary language subtag (internal set)ExtendedLanguage: Extended language subtag (internal set)Script: Script subtag (internal set)Region: Region subtag (internal set)Variants: ImmutableArray of variant subtagsExtensions: ImmutableArray of ExtensionTag objectsPrivateUse: PrivateUseTag objectIsValid: Property to check if tag is valid
Instance Methods:
Validate(): Verify structural correctnessNormalize(): Return normalized copy of tag (does not validate)ToString(): String representationEquals(): Equality comparison (case-insensitive)GetHashCode(): Hash code for collections- Operators:
==,!=
Design Characteristics:
- Implements
IEquatable<LanguageTag> - Constructors are internal, use factory methods or builder
- Properties use internal setters to maintain immutability for public API
- Collections exposed as ImmutableArray for thread safety
Fluent builder for constructing language tags:
Methods:
Language(string value): Set primary languageExtendedLanguage(string value): Set extended languageScript(string value): Set scriptRegion(string value): Set regionVariantAdd(string value): Add a variantVariantAddRange(IEnumerable<string> values): Add multiple variantsExtensionAdd(char prefix, IEnumerable<string> values): Add extension with prefix and valuesPrivateUseAdd(string value): Add private use tagPrivateUseAddRange(IEnumerable<string> values): Add multiple private use tagsBuild(): Return constructed tag (no validation)Normalize(): Return normalized tag (no validation)
Internal implementation - Not exposed in public API. Use LanguageTag.Parse() instead.
- Parses language tags according to RFC 5646 Section 2.1
- Handles grandfathered tags and converts them to current forms
- Normalizes tag casing according to RFC conventions:
- Language: lowercase
- Extended language: lowercase
- Script: Title case
- Region: UPPERCASE
- Variants: lowercase
- Extensions: lowercase
- Private use: lowercase
Provides language code conversion and matching:
Properties:
Undetermined: Constant for "und" (undetermined language)Overrides: User-defined (IETF, ISO) mapping pairs
Methods:
GetIetfFromIso(string languageTag): Convert ISO to IETF formatGetIsoFromIetf(string languageTag): Convert IETF to ISO formatIsMatch(string prefix, string languageTag): Prefix matching for content selection
Static class for configuring global logging for the entire library:
Properties:
LoggerFactory: Gets or sets the global logger factory for creating category loggers
Methods:
SetFactory(ILoggerFactory loggerFactory): Configure the library to use a logger factoryTrySetFactory(ILoggerFactory loggerFactory): Set factory only if none is configured
Logger Resolution Priority:
LoggerFactoryproperty (when notNullLoggerFactory)NullLogger.Instance(default fallback)
Important Notes:
- Loggers are created and cached at time of use by each class instance
- Changes to
LoggerFactoryafter a logger is created do not affect existing cached loggers - Only new logger requests use updated configuration
- ISO 639-2 language codes (3-letter bibliographic/terminologic codes)
- Public Methods:
Create(): Load embedded dataFromDataAsync(string fileName): Load from fileFromJsonAsync(string fileName): Load from JSONFind(string? languageTag, bool includeDescription): Find record by tag
- Internal Methods:
SaveJsonAsync(string fileName),SaveCodeAsync(string fileName) - Record Properties:
Part2B,Part2T,Part1,RefName
- ISO 639-3 language codes (comprehensive language codes)
- Public Methods:
Create(): Load embedded dataFromDataAsync(string fileName): Load from fileFromJsonAsync(string fileName): Load from JSONFind(string? languageTag, bool includeDescription): Find record by tag
- Internal Methods:
SaveJsonAsync(string fileName),SaveCodeAsync(string fileName) - Record Properties:
Id,Part2B,Part2T,Part1,Scope,LanguageType,RefName,Comment
- RFC 5646 / BCP 47 language subtag registry
- Public Methods:
Create(): Load embedded dataFromDataAsync(string fileName): Load from fileFromJsonAsync(string fileName): Load from JSONFind(string? languageTag, bool includeDescription): Find record by tag
- Properties:
FileDate,RecordList - Internal Methods:
SaveJsonAsync(string fileName),SaveCodeAsync(string fileName) - Record Properties:
Type,Tag,SubTag,Description(ImmutableArray),Added,SuppressScript,Scope,MacroLanguage,Deprecated,Comments(ImmutableArray),Prefix(ImmutableArray),PreferredValue,TagValue - Enums:
RecordType: None, Language, ExtLanguage, Script, Variant, Grandfathered, Region, RedundantRecordScope: None, MacroLanguage, Collection, Special, PrivateUse
ExtensionTag (sealed record):
Prefix: Single-character extension prefix (char)Tags: ImmutableArray of extension valuesToString(): Format as "prefix-tag1-tag2"Normalize(): Returns normalized copy with sorted, lowercase tagsEquals(): Case-insensitive equality comparison
PrivateUseTag (sealed record):
Prefix: Constant 'x'Tags: ImmutableArray of private use valuesToString(): Format as "x-tag1-tag2"Normalize(): Returns normalized copy with sorted, lowercase tagsEquals(): Case-insensitive equality comparison
Per RFC 5646, language tags follow this format:
[Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]
Examples:
zh: Simple language tagzh-yue-hk: Language with extended language and regionen-latn-gb-boont-r-extended-sequence-x-private: Full tag with all components
- Language data is updated weekly via GitHub Actions workflow
- The
LanguageTagsCreatetool downloads data from:- ISO 639-2: Library of Congress
- ISO 639-3: SIL International
- RFC 5646: IANA Language Subtag Registry
- Generated C# files (
*DataGen.cs) are committed to the repository - Data files are in
LanguageData/directory
Use static factory methods instead of public constructors:
// Good
LanguageTag tag = LanguageTag.Parse("en-US");
LanguageTag tag = LanguageTag.FromLanguage("en");
// Avoid - constructors are internal
// var tag = new LanguageTag(); // Not accessibleUse fluent builder for complex tag construction:
LanguageTag tag = LanguageTag.CreateBuilder()
.Language("en")
.Region("US")
.Build();- All properties are immutable after construction
- Use
Normalize()to get modified copies - Collections are exposed as
ImmutableArray<T>
Always use safe parsing patterns:
// TryParse pattern
if (LanguageTag.TryParse(input, out LanguageTag? tag))
{
// Use tag
}
// ParseOrDefault pattern
LanguageTag tag = LanguageTag.ParseOrDefault(input); // Falls back to "und"- RFC 5646: Tags for Identifying Languages
- BCP 47: Best Current Practice for Language Tags
- ISO 639-2: 3-letter language codes
- ISO 639-3: Comprehensive language codes
- ISO 15924: Script codes
- ISO 3166-1: Country codes
- UN M.49: Geographic region codes
- IANA Language Subtag Registry: Authoritative registry of subtags
- The implemented language tag parsing and normalization logic may be incomplete or inaccurate
- Grandfathered tags are automatically converted to their preferred values during parsing
- All tag comparisons are case-insensitive per RFC 5646
- Private use tags start with 'x-' prefix
- Extensions use single-character prefixes (except 'x' which is reserved for private use)
LanguageTagParseris internal; all parsing is done throughLanguageTagstatic methods
LanguageTagParseris now internal (useLanguageTag.Parse()instead)- Properties changed from
IList<string>toImmutableArray<string>:VariantList→VariantsExtensionList→ExtensionsTagList→Tags
- Data file APIs are async-only and use static creators:
FromDataAsync()/FromJsonAsync() - Logging configuration now uses
ILoggerFactoryonly;ILoggersupport was removed fromLogOptions - Tag construction requires use of factory methods or builder (constructors are internal)
LanguageTag.ParseOrDefault(): Safe parsing with fallbackLanguageTag.ParseAndNormalize(): Combined parse and normalizeLanguageTag.IsValid: Property for validationLanguageTag.FromLanguage(),FromLanguageRegion(),FromLanguageScriptRegion(): Factory methodsIEquatable<LanguageTag>implementation with operatorsLogOptionsstatic class for global logging configuration withILoggerFactoryExtensionTagandPrivateUseTagare now sealed records withNormalize()and case-insensitiveEquals()methods- Comprehensive XML documentation for all public APIs
Consider these areas for enhancement:
- Use a BNF parser or parser generator (ANTLR4, Eto.Parse, etc.) instead of hand-parsing
- Implement comprehensive subtag content validation against registry data
- Add more language lookup and validation features
- Improve error messages and diagnostics
- Follow the authoritative coding standards and tooling in
CODESTYLE.mdand.editorconfig - Add tests for new public behavior and keep API documentation complete
- Use factory methods or builders for tag creation; avoid public constructors
// Simple parsing
LanguageTag? tag = LanguageTag.Parse("en-US");
// Safe parsing
if (LanguageTag.TryParse("en-US", out LanguageTag? tag))
{
Console.WriteLine(tag.ToString());
}
// Parse with default
LanguageTag tag = LanguageTag.ParseOrDefault(input); // "und" if invalid
// Factory methods
LanguageTag tag = LanguageTag.FromLanguage("en");
LanguageTag tag = LanguageTag.FromLanguageRegion("en", "US");
// Builder
LanguageTag tag = LanguageTag.CreateBuilder()
.Language("en")
.Region("US")
.Build();// Parse and normalize separately
LanguageTag? tag = LanguageTag.Parse("en-latn-us");
LanguageTag? normalized = tag?.Normalize(); // "en-US"
// Parse and normalize in one step
LanguageTag? tag = LanguageTag.ParseAndNormalize("en-latn-us"); // "en-US"LanguageTag tag = LanguageTag.Parse("en-latn-gb-boont-r-extended-x-private")!;
string language = tag.Language; // "en"
string script = tag.Script; // "latn"
string region = tag.Region; // "gb"
ImmutableArray<string> variants = tag.Variants; // ["boont"]
ImmutableArray<ExtensionTag> extensions = tag.Extensions; // [{ Prefix='r', Tags=["extended"] }]
PrivateUseTag privateUse = tag.PrivateUse; // { Tags=["private"] }LanguageTag? tag1 = LanguageTag.Parse("en-US");
LanguageTag? tag2 = LanguageTag.Parse("en-us");
bool equal = tag1 == tag2; // true (case-insensitive)
bool equal = tag1.Equals(tag2); // true
int hash = tag1.GetHashCode(); // Same as tag2.GetHashCode()