Library And Metadata Pipeline

This document describes how Aria scans a local music library, stores metadata, and turns raw file tags into the fields used by the UI.

Goals

Aria treats the library as two layers:

raw tags: what is read directly from the media file
mapped fields: the normalized database-facing fields that the UI uses for browsing, sorting, grouping, and playback context

That split is intentional. Raw tags must remain available even when Aria’s normalized view changes.

Scan Overview

When Aria scans the library, it does this for each configured root:

Recursively discover supported audio files
Read tag data and audio properties from each file
Preserve raw tags
Apply user-configurable field mappings
Apply catalog fallback rules for the catalog field when needed
Resolve album art
Persist the resulting snapshot to SQLite

The scan result is stored in the LibrarySnapshot, which includes:

library roots
scan status and file counts
field mappings
catalog rules
tag inventory
scanned tracks

Supported Audio File Discovery

The scanner currently includes files with these extensions:

flac
mp3
m4a
aac
mp4
ogg
opus
wav
aiff
aif

Discovery is recursive and follows links.

Track Model

Each scanned track stores:

id: currently the full file path
path
file_name
album_art_path
audio: format, duration, sample rate, bit depth, channels
raw_tags: tag -> [values]
mapped_fields: field -> [values]

Raw Tag Extraction

Aria reads tags with Lofty, then supplements that with format-specific raw-tag recovery where needed.

Base extraction

The base pass:

iterates all Lofty tags on the file
resolves each tag key to a normalized uppercase tag name
converts tag values to strings
splits multi-value text when Aria sees common separators
deduplicates values while preserving order

Multi-value splitting

Aria currently splits text values on:

;
/
|
the NUL character

This is intentionally conservative. The goal is to preserve multi-value credits without over-splitting ordinary text.

Format-specific raw tags

Some formats, especially Vorbis-comment based formats, can contain useful custom keys that do not survive a purely generic tag abstraction.

To preserve those, Aria explicitly merges raw Vorbis-comment data for:

FLAC
Ogg Vorbis
Opus
Speex

That is why tags like ENSEMBLE can still appear in raw_tags even if the generic tag path would otherwise lose them.

Tag Inventory

The scan also builds a tag inventory. For each observed raw tag, Aria stores:

tag name
number of tracks where it occurred
up to three example values

This inventory is useful for field-mapping and diagnostics, even though the main Settings pane no longer shows it inline.

Database Field Mappings

Field mappings define how Aria builds normalized fields from raw tags.

Each mapping has:

key: internal field name
label: UI label
tag_priorities: a priority-ordered list of raw tags

Resolution rule

For a given field:

Aria checks source tags in order
the first non-empty source tag wins
all values from that winning tag are kept
duplicate values are removed while preserving order

Aria does not merge across multiple source tags for a single field. Priority is strict.

Default fields

The default field list is:

Field key	Default source tags
`album`	`ALBUM`
`title`	`TITLE`
`catalog`	`CATALOGNUMBER`, `CATALOG`
`composer`	`COMPOSER`
`genre`	`GENRE`
`conductor`	`CONDUCTOR`
`ensemble`	`ENSEMBLE`, `ORCHESTRA`, `ALBUMARTIST`
`soloist`	`PERFORMER`, `ARTIST`, `ALBUMARTIST`
`year`	`DATE`, `YEAR`
`disk_number`	`DISCNUMBER`
`track_number`	`TRACKNUMBER`

Users can edit these in Settings -> Database fields.

Fields may be empty. They may also contain multiple values.

Catalog Extraction

The catalog field is special.

Dedicated tags first

Aria first tries to resolve catalog from the configured field mapping, which defaults to:

CATALOGNUMBER
CATALOG

If that succeeds, catalog fallback parsing is not used.

Fallback rules

If catalog is still empty after normal field mapping, Aria runs user-configurable catalog rules.

Each catalog rule now has:

label: the catalog abbreviation Aria should search for, such as BWV, WAB, K, or Op
composers: optional composer hints
enabled

Users can edit these in Settings -> Catalog rules.

Current built-in examples

Aria ships with built-in rules for common classical catalogs, including examples such as:

BWV
WAB
K
KV
D
RV
HWV
TWV
BuxWV
Hob.
S.
WoO
Op

Shared source-tag priority

All catalog rules use the same source-tag priority.

Aria checks these tags in order:

TITLE
WORK
ALBUM

It:

tries TITLE first for all catalog labels
only falls back to WORK if TITLE produced no catalog matches
only falls back to ALBUM if neither TITLE nor WORK produced matches

This avoids leaking album-level range catalogs into track-level results when the track title already contains the specific catalog number.

Composer-aware matching

If a rule has composer hints, Aria only applies that catalog label when one of these raw tags matches those hints:

COMPOSER
WORKCOMPOSER
COMPOSERSORT

This is how labels like WAB remain Bruckner-specific.

Shared catalog parser

Aria does not store or edit per-rule regex patterns anymore.

Instead, all labels share the same extraction logic:

split colon-separated title/work/album text into segments
search segments from right to left
look for the configured label plus a catalog number
keep the first segment that yields matches

The shared parser also supports sectioned forms like Hob. XVI:52 and Hob. IIIb:2, where the label is followed by a Roman-numeral section, an optional a or b, and then the item number.

The only per-rule difference is the catalog label itself, plus optional composer hints. The built-in Op rule is the catch-all fallback.

Colon-segment preference

When a title contains multiple colon-separated segments, Aria searches the segments from right to left and keeps the first segment that yields catalog matches.

Example:

Das Wohltemperierte Klavier: Book 1, BWV 846-869: Präludium Es-Dur, BWV 852

Aria prefers BWV 852 from the final segment instead of the collection-level range earlier in the string.

Range suppression

Aria ignores catalog matches that are immediately followed by a dash and more digits, such as:

BWV 846-869

That prevents the start of a catalog range from being treated as a single-track catalog number.

Album Art Resolution

Album art is resolved in this order:

Embedded FLAC cover art
Sidecar image files

Embedded FLAC art

For FLAC files, Aria:

reads embedded pictures
prefers CoverFront, then other picture types in a stable priority order
writes the extracted image into a local app-data cache
reuses the cached image on later scans if it already exists and is non-empty

Sidecar fallback

If no embedded FLAC art is available, Aria looks for sidecar files named:

cover.jpg
folder.jpg
front.jpg
cover.png
folder.png
front.png

It searches:

the track directory
the parent directory as a fallback when the track is inside a disc-like folder such as Disc 1 or CD1

Zero-byte sidecar files are ignored.

Current limitation

Embedded art extraction is currently FLAC-first. Other formats mainly rely on sidecar images for now.

Persistence In SQLite

Aria persists the library state into SQLite tables that include:

library_state
library_roots
field_mappings
catalog_rules
tag_inventory
scanned_tracks

Important details:

raw_tags are stored as JSON
mapped_fields are stored as JSON
audio properties are stored as JSON
library settings and playback state are also persisted elsewhere in the same database

On Windows, the default database path is:

%LOCALAPPDATA%\Aria\aria.sqlite3

Settings And Re-scan Behavior

Adding a new library directory from Settings starts a scan automatically
Re-saving field mappings remaps existing scanned tracks from stored raw tags
Re-saving catalog rules also remaps existing scanned tracks from stored raw tags
A full rescan is still needed when the source files themselves changed or when album-art lookup behavior needs to be refreshed

Raw Tags In The UI

The Tracks tab has a Show all tags action. That dialog reads raw tags directly from the selected file on demand, rather than showing only the normalized database fields.

This is useful when debugging:

missing fields
unexpected catalog results
multi-value role mapping
rare custom tags

Practical Debugging Rules

If a field looks wrong, inspect in this order:

raw file tags
current field mapping
current catalog rules, if the field is catalog
whether the track needs a rescan

If scan behavior changes, update this document so it stays aligned with the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Library And Metadata Pipeline

Goals

Scan Overview

Supported Audio File Discovery

Track Model

Raw Tag Extraction

Base extraction

Multi-value splitting

Format-specific raw tags

Tag Inventory

Database Field Mappings

Resolution rule

Default fields

Catalog Extraction

Dedicated tags first

Fallback rules

Current built-in examples

Shared source-tag priority

Composer-aware matching

Shared catalog parser

Colon-segment preference

Range suppression

Album Art Resolution

Embedded FLAC art

Sidecar fallback

Current limitation

Persistence In SQLite

Settings And Re-scan Behavior

Raw Tags In The UI

Practical Debugging Rules

FilesExpand file tree

LIBRARY.md

Latest commit

History

LIBRARY.md

File metadata and controls

Library And Metadata Pipeline

Goals

Scan Overview

Supported Audio File Discovery

Track Model

Raw Tag Extraction

Base extraction

Multi-value splitting

Format-specific raw tags

Tag Inventory

Database Field Mappings

Resolution rule

Default fields

Catalog Extraction

Dedicated tags first

Fallback rules

Current built-in examples

Shared source-tag priority

Composer-aware matching

Shared catalog parser

Colon-segment preference

Range suppression

Album Art Resolution

Embedded FLAC art

Sidecar fallback

Current limitation

Persistence In SQLite

Settings And Re-scan Behavior

Raw Tags In The UI

Practical Debugging Rules