HeroParser - A .Net High-Performance CSV & Fixed-Width Parser

High-Performance SIMD Parsing | Zero Allocations | AOT/Trimming Ready | Fixed-Width Support | Fluent APIs

🚀 Key Features

Reading

RFC 4180 Quote Handling: Supports quoted fields with escaped quotes (""), commas in quotes, per spec
Quote-Aware SIMD: Maintains SIMD performance even with quoted fields
Automatic Delimiter Detection: Detect delimiter from CSV data (comma, semicolon, pipe, tab)
CSV Validation: Pre-flight validation with detailed error reporting
Zero Allocations: Stack-only parsing with ArrayPool for column metadata
Lazy Evaluation: Columns parsed only when accessed
Configurable RFC vs Speed: Toggle quote parsing and opt-in newlines-in-quotes; defaults favor speed
Fluent Builder API: Configure readers with chainable methods (Csv.Read<T>())
LINQ-Style Extensions: Where(), Select(), First(), ToList(), GroupBy(), and more

Writing

High-Performance CSV Writer: 2-5x faster than Sep with 35-85% less memory allocation
SIMD-Accelerated: Uses AVX2/SSE2 for quote detection and field analysis
RFC 4180 Compliant: Proper quote escaping and field quoting
Fluent Builder API: Configure writers with chainable methods (Csv.Write<T>())
Multiple Output Targets: Write to strings, streams, or files

General

Async Streaming: True async I/O with IAsyncEnumerable<T> support for reading and writing
AOT/Trimming Support: Source generators for reflection-free binding ([CsvGenerateBinder])
Line Number Tracking: Both logical row numbers and physical source line numbers for error reporting
Progress Reporting: Track parsing progress for large files with callbacks
Custom Type Converters: Register converters for domain-specific types
Multi-Framework: .NET 8, 9, and 10 support
Zero Dependencies: No external packages for core library

🎯 Design Philosophy

Zero-Allocation, RFC-Compliant Design

Target Frameworks: .NET 8, 9, 10 (modern JIT optimizations)
Memory Safety: No unsafe keyword - uses safe Unsafe class and MemoryMarshal APIs for performance
Minimal API: Simple, focused API surface
Zero Dependencies: No external packages for core library
RFC 4180: Quote handling, escaped quotes, delimiters in quotes; optional newlines-in-quotes (default off), no header detection
SIMD First: Quote-aware SIMD for AVX-512, AVX2, NEON
Allocation Notes: Char-span parsing remains allocation-free; UTF-8 parsing stays zero-allocation for invariant primitives. Culture/format-based parsing on UTF-8 columns decodes to UTF-16 and allocates by design.

API Surface

// Primary API - parse from string with options
var reader = Csv.ReadFromText(csvData);

// Custom options (delimiter, quote character, max columns)
var options = new CsvReadOptions
{
    Delimiter = ',',  // Default
    Quote = '"',      // Default - RFC 4180 compliant
    MaxColumnCount = 100, // Default
    AllowNewlinesInsideQuotes = false, // Enable for full RFC newlines-in-quotes support (slower)
    EnableQuotedFields = true         // Disable for maximum speed when your data has no quotes
};
var reader = Csv.ReadFromText(csvData, options);

📊 Usage Examples

Basic Iteration (Zero Allocations)

foreach (var row in Csv.ReadFromText(csv))
{
    // Access columns by index - no allocations
    var id = row[0].Parse<int>();
    var name = row[1].CharSpan; // ReadOnlySpan<char>
    var price = row[2].Parse<decimal>();
}

Files and Streams

using var fileReader = Csv.ReadFromFile("data.csv"); // streams file without loading it fully

using var stream = File.OpenRead("data.csv");
using var streamReader = Csv.ReadFromStream(stream); // leaveOpen defaults to true

Both overloads stream with pooled buffers and do not load the entire file/stream; dispose the reader (and the stream if you own it) to release resources.

Async I/O

var source = await Csv.ReadFromFileAsync("data.csv");
using var reader = source.CreateReader();

Async overloads also buffer the full payload (required because readers are ref structs); use when you need non-blocking file/stream reads.

Streaming large files (low memory)

using var reader = Csv.ReadFromStream(File.OpenRead("data.csv"));
while (reader.MoveNext())
{
    var row = reader.Current;
    var id = row[0].Parse<int>();
}

Streaming keeps a pooled buffer and does not load the entire file into memory; rows remain valid until the next MoveNext call.

Async streaming (without buffering entire file)

await using var reader = Csv.CreateAsyncStreamReader(File.OpenRead("data.csv"));
while (await reader.MoveNextAsync())
{
    var row = reader.Current;
    var id = row[0].Parse<int>();
}

Async streaming uses pooled buffers and async I/O; each row stays valid until the next MoveNextAsync invocation.

Fluent Reader Builder

Use the fluent builder API for a clean, chainable configuration:

// Read CSV records with fluent configuration
var records = Csv.Read<Person>()
    .WithDelimiter(';')
    .TrimFields()
    .AllowMissingColumns()
    .SkipRows(2)  // Skip metadata rows
    .FromText(csvData)
    .ToList();

// Read from file with async streaming
await foreach (var person in Csv.Read<Person>()
    .WithDelimiter(',')
    .FromFileAsync("data.csv"))
{
    Console.WriteLine($"{person.Name}: {person.Age}");
}

The builder provides a symmetric API to CsvWriterBuilder<T> for reading records.

Manual Row-by-Row Reading (Fluent)

Use the non-generic builder for low-level row-by-row parsing:

// Manual row-by-row reading with fluent configuration
using var reader = Csv.Read()
    .WithDelimiter(';')
    .TrimFields()
    .WithCommentCharacter('#')
    .FromText(csvData);

foreach (var row in reader)
{
    var id = row[0].Parse<int>();
    var name = row[1].ToString();
}

// Stream from file with custom options
using var fileReader = Csv.Read()
    .WithMaxFieldSize(10_000)
    .AllowNewlinesInQuotes()
    .FromFile("data.csv");

LINQ-Style Extension Methods

CSV record readers provide familiar LINQ-style operations for working with records:

// Materialize all records
var allPeople = Csv.Read<Person>().FromText(csv).ToList();
var peopleArray = Csv.Read<Person>().FromText(csv).ToArray();

// Query operations
var adults = Csv.Read<Person>()
    .FromText(csv)
    .Where(p => p.Age >= 18);

var names = Csv.Read<Person>()
    .FromText(csv)
    .Select(p => p.Name);

// First/Single operations
var first = Csv.Read<Person>().FromText(csv).First();
var firstAdult = Csv.Read<Person>().FromText(csv).First(p => p.Age >= 18);
var single = Csv.Read<Person>().FromText(csv).SingleOrDefault();

// Aggregation
var count = Csv.Read<Person>().FromText(csv).Count();
var adultCount = Csv.Read<Person>().FromText(csv).Count(p => p.Age >= 18);
var hasRecords = Csv.Read<Person>().FromText(csv).Any();
var allAdults = Csv.Read<Person>().FromText(csv).All(p => p.Age >= 18);

// Pagination
var page = Csv.Read<Person>().FromText(csv).Skip(10).Take(5);

// Grouping and indexing
var byCity = Csv.Read<Person>()
    .FromText(csv)
    .GroupBy(p => p.City);

var byId = Csv.Read<Person>()
    .FromText(csv)
    .ToDictionary(p => p.Id);

// Iteration
Csv.Read<Person>()
    .FromText(csv)
    .ForEach(p => Console.WriteLine(p.Name));

Note: Since CSV readers are ref structs, they cannot implement IEnumerable<T>. These extension methods consume the reader and return materialized results.

Multi-Schema CSV Parsing

Parse CSV files where different rows map to different record types based on a discriminator column. This is common in banking/financial file formats (NACHA, BAI, EDI) with header/detail/trailer patterns:

// Define record types
[CsvGenerateBinder]
public class HeaderRecord
{
    [CsvColumn(Name = "Type")]
    public string Type { get; set; } = "";

    [CsvColumn(Name = "Date")]
    public DateTime Date { get; set; }
}

[CsvGenerateBinder]
public class DetailRecord
{
    [CsvColumn(Name = "Type")]
    public string Type { get; set; } = "";

    [CsvColumn(Name = "Id")]
    public int Id { get; set; }

    [CsvColumn(Name = "Amount")]
    public decimal Amount { get; set; }
}

[CsvGenerateBinder]
public class TrailerRecord
{
    [CsvColumn(Name = "Type")]
    public string Type { get; set; } = "";

    [CsvColumn(Name = "Count")]
    public int Count { get; set; }
}

// Parse with discriminator-based type routing
var csv = """
Type,Id,Amount,Date,Count
H,0,0.00,2024-01-15,0
D,1,100.50,,0
D,2,200.75,,0
T,0,301.25,,2
""";

foreach (var record in Csv.Read()
    .WithMultiSchema()
    .WithDiscriminator("Type")           // By column name
    .MapRecord<HeaderRecord>("H")
    .MapRecord<DetailRecord>("D")
    .MapRecord<TrailerRecord>("T")
    .AllowMissingColumns()
    .FromText(csv))
{
    switch (record)
    {
        case HeaderRecord h:
            Console.WriteLine($"Header: {h.Date}");
            break;
        case DetailRecord d:
            Console.WriteLine($"Detail: {d.Id} = {d.Amount:C}");
            break;
        case TrailerRecord t:
            Console.WriteLine($"Trailer: {t.Count} records");
            break;
    }
}

Discriminator Options

// By column index (0-based)
.WithDiscriminator(columnIndex: 0)

// By column name (resolved from header)
.WithDiscriminator("RecordType")

// Case-insensitive discriminator matching (default)
.CaseSensitiveDiscriminator(false)

Handling Unmatched Rows

// Skip rows that don't match any registered type
.OnUnmatchedRow(UnmatchedRowBehavior.Skip)

// Throw exception for unmatched rows (default)
.OnUnmatchedRow(UnmatchedRowBehavior.Throw)

// Use custom factory for unmatched rows
.MapRecord((discriminator, columns, rowNum) => new UnknownRecord
{
    Type = discriminator,
    RawData = string.Join(",", columns)
})

Streaming and Async Support

// From file
foreach (var record in Csv.Read()
    .WithMultiSchema()
    .WithDiscriminator("Type")
    .MapRecord<HeaderRecord>("H")
    .MapRecord<DetailRecord>("D")
    .FromFile("transactions.csv"))
{
    // Process records
}

// Async streaming
await foreach (var record in Csv.Read()
    .WithMultiSchema()
    .WithDiscriminator("Type")
    .MapRecord<HeaderRecord>("H")
    .MapRecord<DetailRecord>("D")
    .FromFileAsync("transactions.csv"))
{
    // Process records asynchronously
}

Source-Generated Dispatch (Optimal Performance)

For maximum performance, use source-generated dispatchers instead of runtime multi-schema. The generator creates optimized switch-based dispatch that compiles to jump tables:

[CsvGenerateDispatcher(DiscriminatorIndex = 0)]
[CsvSchemaMapping("H", typeof(HeaderRecord))]
[CsvSchemaMapping("D", typeof(DetailRecord))]
[CsvSchemaMapping("T", typeof(TrailerRecord))]
public partial class BankingDispatcher { }

// Usage:
var reader = Csv.Read().FromText(csv);
if (reader.MoveNext()) { } // Skip header
int rowNumber = 1;
while (reader.MoveNext())
{
    rowNumber++;
    var record = BankingDispatcher.Dispatch(reader.Current, rowNumber);
    switch (record)
    {
        case HeaderRecord h: /* ... */ break;
        case DetailRecord d: /* ... */ break;
        case TrailerRecord t: /* ... */ break;
    }
}

Why source-generated is faster:

Switch expression compiles to jump table (no dictionary lookup)
Direct binder invocation (no interface dispatch)
No boxing/unboxing overhead
~2.85x faster than runtime multi-schema dispatch

Note: All mapped types must have [CsvGenerateBinder] attribute for AOT compatibility.

Automatic Delimiter Detection

HeroParser can automatically detect the delimiter character used in CSV data:

// Auto-detect delimiter
char delimiter = Csv.DetectDelimiter(csvData);

// Use detected delimiter
var records = Csv.Read<Person>()
    .WithDelimiter(delimiter)
    .FromText(csvData)
    .ToList();

Supported delimiters: comma (,), semicolon (;), pipe (|), tab (\t)

Detailed Detection Results

Get confidence scores and candidate delimiter counts:

var result = Csv.DetectDelimiterWithDetails(csvData);

Console.WriteLine($"Detected: '{result.DetectedDelimiter}'");
Console.WriteLine($"Confidence: {result.Confidence}%");
Console.WriteLine($"Average count per row: {result.AverageDelimiterCount}");

if (result.Confidence < 50)
{
    Console.WriteLine("Low confidence - manual verification recommended");
    foreach (var candidate in result.CandidateCounts)
    {
        Console.WriteLine($"  {candidate.Key}: {candidate.Value} occurrences");
    }
}

// Use detected delimiter
var records = Csv.Read<Person>()
    .WithDelimiter(result.DetectedDelimiter)
    .FromText(csvData)
    .ToList();

Detection Algorithm:

Samples first N rows (default 10, configurable)
Counts occurrences of candidate delimiters
Selects delimiter with most consistent count across rows
Calculates confidence based on consistency (100% = perfect consistency)

Use Cases:

User-uploaded CSV files with unknown format
Processing CSVs from multiple sources with varying delimiters
European CSVs (semicolon-delimited)
Log files (pipe or tab-delimited)

CSV Validation

Validate CSV structure and content before processing:

var options = new CsvValidationOptions
{
    RequiredHeaders = new[] { "Name", "Email", "Age" },
    ExpectedColumnCount = 3,
    MaxRows = 10000
};

var result = Csv.Validate(csvData, options);

if (!result.IsValid)
{
    Console.WriteLine($"Validation failed with {result.Errors.Count} errors:");
    foreach (var error in result.Errors)
    {
        Console.WriteLine($"  Row {error.RowNumber}: {error.Message}");
    }
    return;
}

// Validation passed - proceed with processing
var records = Csv.Read<Person>().FromText(csvData).ToList();

Validation Checks

Automatic checks:

Parse errors (malformed CSV structure)
Empty files
Inconsistent column counts across rows
Row count limits (DoS protection)

Configurable checks:

Required headers presence
Expected column count
Delimiter auto-detection

Validation Options:

var options = new CsvValidationOptions
{
    Delimiter = null,                    // Auto-detect delimiter
    HasHeaderRow = true,                 // Expect header row
    RequiredHeaders = new[] { "Id", "Name" },  // Required columns
    ExpectedColumnCount = 5,             // Exact column count
    MaxRows = 1_000_000,                 // Maximum rows allowed
    CheckConsistentColumnCount = true,   // All rows must have same column count
    AllowEmptyFile = false               // Reject empty files
};

Validation Result:

var result = Csv.Validate(csvData, options);

// Check overall validity
if (result.IsValid)
{
    Console.WriteLine($"Valid CSV: {result.TotalRows} rows, {result.ColumnCount} columns");
    Console.WriteLine($"Delimiter: '{result.Delimiter}'");
    Console.WriteLine($"Headers: {string.Join(", ", result.Headers)}");
}

// Inspect errors
foreach (var error in result.Errors)
{
    Console.WriteLine($"[{error.ErrorType}] Row {error.RowNumber}, Col {error.ColumnNumber}");
    Console.WriteLine($"  Message: {error.Message}");
    if (error.Expected != null)
        Console.WriteLine($"  Expected: {error.Expected}, Actual: {error.Actual}");
}

Error Types:

ParseError - CSV structure could not be parsed
MissingHeader - Required header is missing
ColumnCountMismatch - Column count doesn't match expected
TooManyRows - Row count exceeds maximum
EmptyFile - File contains no data
InconsistentColumnCount - Rows have different column counts
DelimiterDetectionFailed - Could not auto-detect delimiter

Use Cases:

Pre-flight validation for ETL pipelines
User-uploaded file validation
API request validation
Data quality checks before processing
Fail-fast error detection for large files

Advanced Reader Options

Progress Reporting

Track parsing progress for large files:

var progress = new Progress<CsvProgress>(p =>
{
    var pct = p.TotalBytes > 0 ? (p.BytesProcessed * 100.0 / p.TotalBytes) : 0;
    Console.WriteLine($"Processed {p.RowsProcessed} rows ({pct:F1}%)");
});

var records = Csv.Read<Person>()
    .WithProgress(progress, intervalRows: 1000)
    .FromFile("large-file.csv")
    .ToList();

Error Handling

Handle deserialization errors gracefully:

var records = Csv.Read<Person>()
    .OnError(ctx =>
    {
        Console.WriteLine($"Error at row {ctx.Row}, column '{ctx.MemberName}': {ctx.Exception?.Message}");
        return DeserializeErrorAction.Skip;  // Or UseDefault, Throw
    })
    .FromText(csv)
    .ToList();

Header Validation

Enforce required headers and detect duplicates:

// Require specific headers
var records = Csv.Read<Person>()
    .RequireHeaders("Name", "Email", "Age")
    .FromText(csv)
    .ToList();

// Detect duplicate headers
var records = Csv.Read<Person>()
    .DetectDuplicateHeaders()
    .FromText(csv)
    .ToList();

// Custom header validation
var records = Csv.Read<Person>()
    .ValidateHeaders(headers =>
    {
        if (!headers.Contains("Id"))
            throw new CsvException(CsvErrorCode.InvalidHeader, "Missing required 'Id' column");
    })
    .FromText(csv)
    .ToList();

Custom Type Converters

Register custom converters for domain-specific types:

var records = Csv.Read<Order>()
    .RegisterConverter<Money>((column, culture) =>
    {
        var text = column.ToString();
        if (Money.TryParse(text, out var money))
            return money;
        throw new FormatException($"Invalid money format: {text}");
    })
    .FromText(csv)
    .ToList();

✍️ CSV Writing

HeroParser includes a high-performance CSV writer that is 2-5x faster than Sep with significantly lower memory allocations.

Basic Writing

// Write records to a string
var records = new[]
{
    new Person { Name = "Alice", Age = 30 },
    new Person { Name = "Bob", Age = 25 }
};

string csv = Csv.WriteToText(records);
// Output:
// Name,Age
// Alice,30
// Bob,25

Writing to Files and Streams

// Write to a file
Csv.WriteToFile("output.csv", records);

// Write to a stream
using var stream = File.Create("output.csv");
Csv.WriteToStream(stream, records);

// Async writing (optimized for in-memory collections)
await Csv.WriteToFileAsync("output.csv", records);

// Async writing with IAsyncEnumerable (for streaming data sources)
await Csv.WriteToFileAsync("output.csv", GetRecordsAsync());

High-Performance Async Writing

For scenarios requiring true async I/O, use the CsvAsyncStreamWriter:

// Low-level async writer with sync fast paths
await using var writer = Csv.CreateAsyncStreamWriter(stream);
await writer.WriteRowAsync(new[] { "Alice", "30", "NYC" });
await writer.WriteRowAsync(new[] { "Bob", "25", "LA" });
await writer.FlushAsync();

// Builder API with async streaming (16-43% faster than sync at scale)
await Csv.Write<Person>()
    .WithDelimiter(',')
    .WithHeader()
    .ToStreamAsyncStreaming(stream, records);  // IEnumerable overload

The async writer uses sync fast paths when data fits in the buffer, avoiding async overhead for small writes while supporting true non-blocking I/O for large datasets.

Writer Options

var options = new CsvWriteOptions
{
    Delimiter = ',',           // Field delimiter (default: comma)
    Quote = '"',               // Quote character (default: double quote)
    NewLine = "\r\n",          // Line ending (default: CRLF per RFC 4180)
    WriteHeader = true,        // Include header row (default: true)
    QuoteStyle = QuoteStyle.WhenNeeded,  // Quote only when necessary
    NullValue = "",            // String to write for null values
    Culture = CultureInfo.InvariantCulture,
    DateTimeFormat = "O",      // ISO 8601 format for dates
    NumberFormat = "G"         // General format for numbers
};

string csv = Csv.WriteToText(records, options);

Fluent Writer Builder

// Write records with fluent configuration
var csv = Csv.Write<Person>()
    .WithDelimiter(';')
    .AlwaysQuote()
    .WithDateTimeFormat("yyyy-MM-dd")
    .WithHeader()
    .ToText(records);

// Write to file with async streaming
await Csv.Write<Person>()
    .WithDelimiter(',')
    .WithoutHeader()
    .ToFileAsync("output.csv", recordsAsync);

The builder provides a symmetric API to CsvReaderBuilder<T> for writing records.

Manual Row-by-Row Writing (Fluent)

Use the non-generic builder for low-level row-by-row writing:

// Manual row-by-row writing with fluent configuration
using var writer = Csv.Write()
    .WithDelimiter(';')
    .AlwaysQuote()
    .WithDateTimeFormat("yyyy-MM-dd")
    .CreateWriter(Console.Out);

writer.WriteField("Name");
writer.WriteField("Age");
writer.EndRow();

writer.WriteField("Alice");
writer.WriteField(30);
writer.EndRow();

writer.Flush();

// Write to file with custom options
using var fileWriter = Csv.Write()
    .WithNewLine("\n")
    .WithCulture("de-DE")
    .CreateFileWriter("output.csv");

Low-Level Row Writing

using var writer = Csv.CreateWriter(Console.Out);

// Write header
writer.WriteField("Name");
writer.WriteField("Age");
writer.EndRow();

// Write data rows
writer.WriteField("Alice");
writer.WriteField(30);
writer.EndRow();

writer.Flush();

Error Handling

var options = new CsvWriteOptions
{
    OnSerializeError = ctx =>
    {
        Console.WriteLine($"Error at row {ctx.Row}, column '{ctx.MemberName}': {ctx.Exception?.Message}");
        return SerializeErrorAction.WriteNull;  // Or SkipRow, Throw
    }
};

🔒 Security Considerations

HeroParser includes built-in protections against common CSV security vulnerabilities.

DoS Protection

Protect against malicious or malformed CSV files with configurable limits:

var options = new CsvReadOptions
{
    MaxColumnCount = 100,       // Prevent column explosion attacks
    MaxRowCount = 1_000_000,    // Limit total rows processed
    MaxFieldSize = 10_000,      // Prevent huge field allocations
    MaxRowSize = 512 * 1024     // 512KB row limit for streaming
};

var reader = Csv.Read().WithOptions(options).FromFile("untrusted.csv");

Recommended Limits for Untrusted Input:

MaxColumnCount: 100-1000 (based on expected schema)
MaxRowCount: 1,000,000 (based on available memory)
MaxFieldSize: 10,000-100,000 bytes
MaxRowSize: 512KB-1MB (for streaming readers)

CSV Injection Prevention

When exporting user data to CSV, enable injection protection to prevent formula injection attacks:

Csv.Write<T>()
    .WithInjectionProtection(CsvInjectionProtection.Sanitize)
    .ToFile("export.csv");

Injection Protection Modes:

None (default): No protection - use for trusted data only
Sanitize: Removes dangerous characters (=, @, +, -, \t, \r)
EscapeWithQuote: Wraps dangerous values in quotes and escapes internal quotes
EscapeWithTab: Prefixes dangerous characters with tab

Example:

var writeOptions = new CsvWriteOptions
{
    InjectionProtection = CsvInjectionProtection.Sanitize
};

// Dangerous value: "=1+1" becomes "'=1+1" (prefixed with single quote)
Csv.WriteToText(records, writeOptions);

Secure File Handling

For production applications processing untrusted files:

Validate before processing:

var options = new CsvReadOptions { MaxColumnCount = 50, MaxRowCount = 100_000 };
options.Validate(); // Throws if configuration is invalid

Use streaming for large files:

// Avoid loading entire file into memory
await using var reader = Csv.CreateAsyncStreamReader(File.OpenRead("large.csv"));
while (await reader.MoveNextAsync())
{
    var row = reader.Current;
    // Process row...
}

Catch and handle exceptions:

try
{
    var records = Csv.Read<T>().FromFile("untrusted.csv").ToList();
}
catch (CsvException ex)
{
    Console.WriteLine($"CSV error at row {ex.Row}, col {ex.Column}: {ex.Message}");
    // Log and handle appropriately
}

Implement timeouts for async operations:

using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
await foreach (var record in Csv.Read<T>()
    .FromFileAsync("untrusted.csv")
    .WithCancellation(cts.Token))
{
    // Process record...
}

Thread-Safety

Note: HeroParser readers and writers are not thread-safe by design for performance:

Readers: Use separate reader instances per thread
Writers: Use separate writer instances per thread
Options: CsvReadOptions and CsvWriteOptions are immutable and safe to share after validation

Multi-threaded Processing:

// ✅ Good: Each thread gets its own reader
Parallel.ForEach(files, file =>
{
    var reader = Csv.Read<T>().FromFile(file);
    // Process...
});

// ❌ Bad: Sharing reader across threads
var reader = Csv.Read<T>().FromFile("data.csv");
Parallel.ForEach(reader, record => { /* ... */ }); // NOT SAFE!

Benchmarks

# Run all benchmarks
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --all

# Reading benchmarks
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --throughput
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --streaming

# Writing benchmarks
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --writer
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --sync-writer
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --async-writer

Reading Performance

HeroParser uses CLMUL-based branchless quote masking (PCLMULQDQ instruction) for efficient quote-aware SIMD parsing. Results on AMD Ryzen AI 9 HX PRO 370, .NET 10:

Rows	Columns	Quotes	Time	Throughput
10k	25	No	552 μs	~6.1 GB/s
10k	25	Yes	1,344 μs	~5.1 GB/s
10k	100	No	1,451 μs	~4.5 GB/s
10k	100	Yes	3,617 μs	~1.9 GB/s
100k	100	No	14,568 μs	~4.5 GB/s
100k	100	Yes	35,396 μs	~1.9 GB/s

Key characteristics:

Fixed 4 KB allocation regardless of column count or file size
Scales well with wide CSVs - performance remains consistent with 50-100+ columns
UTF-8 optimized - use byte[] or ReadOnlySpan<byte> APIs for best performance
Quote-aware SIMD - maintains high throughput even with quoted fields

Writing Performance

HeroParser's CSV writer is optimized for high throughput with minimal allocations:

Scenario	Throughput	Memory
Sync Writing	~2-3 GB/s	35-85% less than alternatives
Async Writing	~1.5-2 GB/s	Pooled buffers, minimal GC

Key characteristics:

SIMD-accelerated quote detection and field analysis
RFC 4180 compliant proper quote escaping
Sync fast paths in async writer avoid overhead for small writes

Quote Handling (RFC 4180)

var csv = "field1,\"field2\",\"field,3\"\n" +
          "aaa,\"b,bb\",ccc\n" +
          "zzz,\"y\"\"yy\",xxx";  // Escaped quote

foreach (var row in Csv.ReadFromText(csv))
{
    // Access raw value (includes quotes)
    var raw = row[1].ToString(); // "b,bb"

    // Remove surrounding quotes and unescape
    var unquoted = row[1].UnquoteToString(); // b,bb

    // Zero-allocation unquote (returns span)
    var span = row[1].Unquote(); // ReadOnlySpan<char>
}

Type Parsing

foreach (var row in Csv.ReadFromText(csv))
{
    // Generic parsing (ISpanParsable<T>)
    var value = row[0].Parse<int>();

    // Optimized type-specific methods
    if (row[1].TryParseDouble(out double d)) { }
    if (row[2].TryParseDateTime(out DateTime dt)) { }
    if (row[3].TryParseBoolean(out bool b)) { }

    // Additional type parsing
    if (row[4].TryParseGuid(out Guid id)) { }
    if (row[5].TryParseEnum<DayOfWeek>(out var day)) { }  // Case-insensitive
    if (row[6].TryParseTimeZoneInfo(out TimeZoneInfo tz)) { }
}

Lazy Evaluation

// Columns are NOT parsed until first access
foreach (var row in Csv.ReadFromText(csv))
{
    // Skip rows without parsing columns
    if (ShouldSkip(row))
        continue;

    // Only parse columns when accessed
    var value = row[0].Parse<int>();  // First access triggers parsing
}

Comment Lines

Skip comment lines in CSV files:

var options = new CsvReadOptions
{
    CommentCharacter = '#'  // Lines starting with # are ignored
};

var csv = @"# This is a comment
Name,Age
Alice,30
# Another comment
Bob,25";

foreach (var row in Csv.ReadFromText(csv, options))
{
    // Only data rows are processed
}

Trimming Whitespace

Remove leading and trailing whitespace from unquoted fields:

var options = new CsvReadOptions
{
    TrimFields = true  // Trim whitespace from unquoted fields
};

var csv = "  Name  ,  Age  \nAlice,  30  ";
foreach (var row in Csv.ReadFromText(csv, options))
{
    var name = row[0].ToString();  // "Name" (trimmed)
    var age = row[1].ToString();   // "30" (trimmed)
}

Null Value Handling

Treat specific string values as null during record parsing:

var recordOptions = new CsvRecordOptions
{
    NullValues = new[] { "NULL", "N/A", "NA", "" }
};

var csv = "Name,Value\nAlice,100\nBob,NULL\nCharlie,N/A";
foreach (var record in Csv.ParseRecords<MyRecord>(csv, recordOptions))
{
    // record.Value will be null when the field contains "NULL" or "N/A"
}

Security: Field Length Limits

Protect against DoS attacks with oversized fields:

var options = new CsvReadOptions
{
    MaxFieldSize = 10_000  // Throw exception if any field exceeds 10KB
};

// This will throw CsvException if a field is too large
var reader = Csv.ReadFromText(csv, options);

Skip Metadata Rows

Skip header rows or metadata before parsing:

var recordOptions = new CsvRecordOptions
{
    SkipRows = 2,  // Skip first 2 rows (e.g., metadata)
    HasHeaderRow = true  // The 3rd row is the header
};

var csv = @"File Version: 1.0
Generated: 2024-01-01
Name,Age
Alice,30
Bob,25";

foreach (var record in Csv.ParseRecords<MyRecord>(csv, recordOptions))
{
    // First 2 rows are skipped, 3rd row used as header
}

Storing Rows Safely

Rows are ref structs and cannot escape their scope. Use Clone() or ToImmutable() to store them:

var storedRows = new List<CsvCharSpanRow>();

foreach (var row in Csv.ReadFromText(csv))
{
    // ❌ WRONG: Cannot store ref struct directly
    // storedRows.Add(row);

    // ✅ CORRECT: Clone creates an owned copy
    storedRows.Add(row.Clone());
}

// Rows can now be safely accessed after enumeration
foreach (var row in storedRows)
{
    var value = row[0].ToString();
}

Line Number Tracking

Track row positions and source line numbers for error reporting:

foreach (var row in Csv.ReadFromText(csv))
{
    try
    {
        var id = row[0].Parse<int>();
    }
    catch (FormatException)
    {
        // LineNumber: 1-based logical row position (ordinal)
        // SourceLineNumber: 1-based physical line in the file (handles multi-line quoted fields)
        Console.WriteLine($"Invalid data at row {row.LineNumber} (source line {row.SourceLineNumber})");
    }
}

This distinction is important when CSV files contain multi-line quoted fields - LineNumber gives you the row index while SourceLineNumber tells you the exact line in the source file where the row starts.

⚠️ Important: Resource Management

HeroParser readers use ArrayPool buffers and MUST be disposed to prevent memory leaks.

// ✅ RECOMMENDED: Use 'using' statement
using (var reader = Csv.ReadFromText(csv))
{
    foreach (var row in reader)
    {
        var value = row[0].ToString();
    }
} // ArrayPool buffers automatically returned

// ✅ ALSO WORKS: foreach automatically disposes
foreach (var row in Csv.ReadFromText(csv))
{
    var value = row[0].ToString();
} // Disposed after foreach completes

// ❌ AVOID: Manual iteration without disposal
var reader = Csv.ReadFromText(csv);
while (reader.MoveNext())
{
    // ...
}
// MEMORY LEAK! ArrayPool buffers not returned

// ✅ FIX: Manually dispose if not using foreach
var reader = Csv.ReadFromText(csv);
try
{
    while (reader.MoveNext()) { /* ... */ }
}
finally
{
    reader.Dispose(); // Always dispose!
}

📁 Fixed-Width File Parsing

HeroParser includes comprehensive support for fixed-width (fixed-length) file parsing and writing, commonly used in legacy systems, mainframe exports, and financial data interchange.

Basic Reading

// Define record type with column mappings
[FixedWidthGenerateBinder]
public class Employee
{
    [FixedWidthColumn(Start = 0, Length = 10)]
    public string Id { get; set; } = "";

    [FixedWidthColumn(Start = 10, Length = 30)]
    public string Name { get; set; } = "";

    [FixedWidthColumn(Start = 40, Length = 10, Alignment = FieldAlignment.Right, PadChar = '0')]
    public decimal Salary { get; set; }
}

// Read records with fluent builder
foreach (var emp in FixedWidth.Read<Employee>().FromFile("employees.dat"))
{
    Console.WriteLine($"{emp.Name}: {emp.Salary:C}");
}

Reading from Files and Streams

// Read from string
var records = FixedWidth.Read<Employee>().FromText(data).ToList();

// Read from file
var records = FixedWidth.Read<Employee>().FromFile("data.dat").ToList();

// Read from stream
var records = FixedWidth.Read<Employee>().FromStream(stream).ToList();

// Async file reading
await foreach (var emp in FixedWidth.Read<Employee>().FromFileAsync("data.dat"))
{
    Console.WriteLine(emp.Name);
}

Manual Row-by-Row Reading

// Configure and read manually without binding to a type
foreach (var row in FixedWidth.Read()
    .WithRecordLength(80)
    .WithDefaultPadChar(' ')
    .FromFile("legacy.dat"))
{
    var id = row.GetField(0, 10).ToString();
    var name = row.GetField(10, 30).ToString();
    Console.WriteLine($"{id}: {name}");
}

Field Alignment

Fixed-width fields support four alignment modes that control how padding is trimmed:

public class Transaction
{
    // Left-aligned: "John      " -> "John" (trims trailing spaces)
    [FixedWidthColumn(Start = 0, Length = 10, Alignment = FieldAlignment.Left)]
    public string Name { get; set; } = "";

    // Right-aligned: "000012345" -> "12345" (trims leading zeros)
    [FixedWidthColumn(Start = 10, Length = 10, Alignment = FieldAlignment.Right, PadChar = '0')]
    public int Amount { get; set; }

    // Center-aligned: "  Data  " -> "Data" (trims both sides)
    [FixedWidthColumn(Start = 20, Length = 10, Alignment = FieldAlignment.Center)]
    public string Code { get; set; } = "";

    // None: No trimming, raw value preserved
    [FixedWidthColumn(Start = 30, Length = 10, Alignment = FieldAlignment.None)]
    public string RawField { get; set; } = "";
}

Alternative Field Bound Syntax: End Property

You can specify field bounds using either Start/Length or Start/End:

public class Record
{
    // Using Length: field from position 0, 10 characters long
    [FixedWidthColumn(Start = 0, Length = 10)]
    public string Id { get; set; } = "";

    // Using End: field from position 10 to 30 (exclusive), same as Length = 20
    [FixedWidthColumn(Start = 10, End = 30)]
    public string Name { get; set; } = "";

    // Using End with other options
    [FixedWidthColumn(Start = 30, End = 40, Alignment = FieldAlignment.Right, PadChar = '0')]
    public decimal Amount { get; set; }
}

The End property specifies the exclusive ending position of the field. When both Length and End are specified, Length takes precedence.

Handling Missing Columns

When parsing files where trailing fields may be omitted or rows vary in length, use AllowMissingColumns():

// Handle short rows gracefully - missing fields return empty values
var records = FixedWidth.Read<Employee>()
    .AllowMissingColumns()
    .FromFile("variable-length.dat")
    .ToList();

// By default, accessing fields beyond row length throws FixedWidthException
// Use AllowMissingColumns() when:
// - Trailing fields are optional
// - Records may have variable lengths
// - Legacy files have inconsistent formatting

Date/Time Format Strings

public class Record
{
    // Parse date with exact format
    [FixedWidthColumn(Start = 0, Length = 8, Format = "yyyyMMdd")]
    public DateTime TransactionDate { get; set; }

    // Parse time with exact format
    [FixedWidthColumn(Start = 8, Length = 6, Format = "HHmmss")]
    public TimeOnly TransactionTime { get; set; }
}

Fluent Builder Options

var records = FixedWidth.Read<Employee>()
    .WithDefaultPadChar(' ')           // Default padding character
    .WithDefaultAlignment(FieldAlignment.Left)  // Default field alignment
    .WithRecordLength(80)              // Fixed record length (vs line-based)
    .SkipRows(2)                       // Skip header rows
    .WithCommentCharacter('#')         // Skip comment lines
    .WithMaxRecords(10_000)            // Limit records (DoS protection)
    .WithMaxInputSize(50 * 1024 * 1024) // 50 MB max file size
    .WithCulture("de-DE")              // Culture for parsing
    .WithNullValues("NULL", "N/A")     // Values treated as null
    .TrackLineNumbers()                // Enable line number tracking
    .OnError((ctx, ex) =>              // Error handling
    {
        Console.WriteLine($"Error at record {ctx.RecordNumber}: {ex.Message}");
        return FixedWidthDeserializeErrorAction.SkipRecord;
    })
    .FromFile("data.dat")
    .ToList();

Validation Attributes

using HeroParser.FixedWidths.Validation;

public class ValidatedRecord
{
    [FixedWidthColumn(Start = 0, Length = 10)]
    [FixedWidthRequired]  // Field cannot be empty/whitespace
    public string Id { get; set; } = "";

    [FixedWidthColumn(Start = 10, Length = 20)]
    [FixedWidthStringLength(MinLength = 2, MaxLength = 20)]
    public string Name { get; set; } = "";

    [FixedWidthColumn(Start = 30, Length = 10)]
    [FixedWidthRange(Minimum = 0, Maximum = 1000000)]
    public decimal Amount { get; set; }

    [FixedWidthColumn(Start = 40, Length = 15)]
    [FixedWidthRegex(@"^\d{3}-\d{3}-\d{4}$", ErrorMessage = "Invalid phone format")]
    public string Phone { get; set; } = "";
}

Writing Fixed-Width Data

// Write records to string
var text = FixedWidth.WriteToText(employees);

// Write to file
FixedWidth.WriteToFile("output.dat", employees);

// Write to stream
FixedWidth.WriteToStream(stream, employees);

// Async writing
await FixedWidth.WriteToFileAsync("output.dat", employees);

// With options
await FixedWidth.WriteToFileAsync("output.dat", employees, new FixedWidthWriteOptions
{
    NewLine = "\r\n",
    DefaultPadChar = ' '
});

Fluent Writer Builder

// Write with fluent configuration
var text = FixedWidth.Write<Employee>()
    .WithPadChar(' ')
    .AlignLeft()
    .ToText(employees);

// Write to file
FixedWidth.Write<Employee>()
    .WithNewLine("\r\n")
    .ToFile("output.dat", employees);

Manual Row-by-Row Writing

using var writer = FixedWidth.Write()
    .WithPadChar(' ')
    .CreateFileWriter("output.dat");

// Write header
writer.WriteField("ID", 10);
writer.WriteField("NAME", 30);
writer.WriteField("AMOUNT", 10, FieldAlignment.Right);
writer.EndRow();

// Write data
writer.WriteField("001", 10);
writer.WriteField("Alice", 30);
writer.WriteField("12345", 10, FieldAlignment.Right, '0');
writer.EndRow();

writer.Flush();

Low-Level Writer Creation

// Create writer from TextWriter
using var writer = FixedWidth.CreateWriter(Console.Out);

// Create writer from Stream
using var stream = File.Create("output.dat");
using var streamWriter = FixedWidth.CreateStreamWriter(stream);

Async Row-by-Row Writing

For scenarios requiring true async I/O, use the FixedWidthAsyncStreamWriter:

// Low-level async writer with sync fast paths
await using var writer = FixedWidth.CreateAsyncStreamWriter(stream);
await writer.WriteFieldAsync("Alice", 20);
await writer.WriteFieldAsync("30", 5, FieldAlignment.Right);
await writer.EndRowAsync();
await writer.FlushAsync();

The async writer uses sync fast paths when data fits in the buffer, avoiding async overhead for small writes while supporting true non-blocking I/O for large datasets.

Custom Type Converters

var records = FixedWidth.Read<Order>()
    .RegisterConverter<Money>((value, culture, format, out result) =>
    {
        if (decimal.TryParse(value, NumberStyles.Currency, culture, out var amount))
        {
            result = new Money(amount);
            return true;
        }
        result = default;
        return false;
    })
    .FromFile("orders.dat")
    .ToList();

Source Generator (AOT Support)

For AOT compilation and trimming support, use the [FixedWidthGenerateBinder] attribute:

using HeroParser.FixedWidths.Records.Binding;

[FixedWidthGenerateBinder]
public class Employee
{
    [FixedWidthColumn(Start = 0, Length = 10)]
    public string Id { get; set; } = "";

    [FixedWidthColumn(Start = 10, Length = 30)]
    public string Name { get; set; } = "";
}

The source generator creates compile-time binders, enabling:

AOT compatibility - No runtime reflection
Faster startup - Binders are pre-compiled
Trimming-safe - Works with .NET trimming/linking

🏗️ Building

Requirements:

.NET 8, 9, or 10 SDK
C# 12+ language features
Recommended: AVX-512 or AVX2 capable CPU for maximum performance

# Build library
dotnet build src/HeroParser/HeroParser.csproj

# Run tests
dotnet test tests/HeroParser.Tests/HeroParser.Tests.csproj

# Run all benchmarks
dotnet run --project benchmarks/HeroParser.Benchmarks -c Release -- --all

Development Setup

To enable pre-commit format checks (recommended):

# Configure git to use the project's hooks
git config core.hooksPath .githooks

This runs dotnet format --verify-no-changes before each commit. If formatting issues are found, the commit is blocked until you run dotnet format to fix them.

🔧 Source Generators (AOT Support)

For AOT (Ahead-of-Time) compilation scenarios, HeroParser supports source-generated binders that avoid reflection:

using HeroParser.SeparatedValues.Records.Binding;

[CsvGenerateBinder]
public class Person
{
    public string Name { get; set; } = "";
    public int Age { get; set; }
    public string? Email { get; set; }
}

The [CsvGenerateBinder] attribute instructs the source generator to emit a compile-time binder, enabling:

AOT compatibility - No runtime reflection required
Faster startup - Binders are pre-compiled
Trimming-safe - Works with .NET trimming/linking

Note: Source generators require the HeroParser.Generators package and a compatible SDK.

⚠️ RFC 4180 Compliance

HeroParser implements core RFC 4180 features:

✅ Supported:

Quoted fields with double-quote character (")
Escaped quotes using double-double-quotes ("")
Delimiters (commas) within quoted fields
Both LF (\n) and CRLF (\r\n) line endings
Newlines inside quoted fields when AllowNewlinesInsideQuotes = true (default is false for performance)
Empty fields and spaces preserved
Custom delimiters and quote characters

❌ Not Supported:

Automatic header detection - Users skip header rows manually

This provides excellent RFC 4180 compatibility for most CSV use cases (logs, exports, data interchange).

📝 License

MIT

🙏 Acknowledgments

HeroParser was inspired by the excellent work in the .NET CSV parsing ecosystem:

Sep by nietras - Pioneering SIMD-based CSV parsing techniques
Sylvan.Data.Csv - High-performance CSV parsing patterns
SimdUnicode - SIMD text processing techniques

Special thanks to the .NET performance community for their research and open-source contributions.

High-performance, zero-allocation, AOT-ready CSV & fixed-width parsing for .NET

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.githooks		.githooks
.github		.github
benchmarks/HeroParser.Benchmarks		benchmarks/HeroParser.Benchmarks
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Directory.Build.props		Directory.Build.props
HeroParser.slnx		HeroParser.slnx
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json

License

KoalaFacts/HeroParser

Folders and files

Latest commit

History

Repository files navigation

HeroParser - A .Net High-Performance CSV & Fixed-Width Parser

🚀 Key Features

Reading

Writing

General

🎯 Design Philosophy

Zero-Allocation, RFC-Compliant Design

API Surface

📊 Usage Examples

Basic Iteration (Zero Allocations)

Files and Streams

Async I/O

Streaming large files (low memory)

Async streaming (without buffering entire file)

Fluent Reader Builder

Manual Row-by-Row Reading (Fluent)

LINQ-Style Extension Methods

Multi-Schema CSV Parsing

Discriminator Options

Handling Unmatched Rows

Streaming and Async Support

Source-Generated Dispatch (Optimal Performance)

Automatic Delimiter Detection

Detailed Detection Results

CSV Validation

Validation Checks

Advanced Reader Options

Progress Reporting

Error Handling

Header Validation

Custom Type Converters

✍️ CSV Writing

Basic Writing

Writing to Files and Streams

High-Performance Async Writing

Writer Options

Fluent Writer Builder

Manual Row-by-Row Writing (Fluent)

Low-Level Row Writing

Error Handling

🔒 Security Considerations

DoS Protection

CSV Injection Prevention

Secure File Handling

Thread-Safety

Benchmarks

Reading Performance

Writing Performance

Quote Handling (RFC 4180)

Type Parsing

Lazy Evaluation

Comment Lines

Trimming Whitespace

Null Value Handling

Security: Field Length Limits

Skip Metadata Rows

Storing Rows Safely

Line Number Tracking

⚠️ Important: Resource Management

📁 Fixed-Width File Parsing

Basic Reading

Reading from Files and Streams

Manual Row-by-Row Reading

Field Alignment

Alternative Field Bound Syntax: End Property

Handling Missing Columns

Date/Time Format Strings

Fluent Builder Options

Validation Attributes

Writing Fixed-Width Data

Fluent Writer Builder

Manual Row-by-Row Writing

Low-Level Writer Creation

Async Row-by-Row Writing

Packages