aubio-ledfx Optimization and Modernization Roadmap

Document Version: 1.0
Created: 2025-11-14
Project: aubio-ledfx maintained fork
Purpose: Comprehensive optimization and modernization strategy

Executive Summary

This document presents the top 5 highest priority optimization and modernization work items for aubio-ledfx, a maintained fork providing Python 3.8-3.13 support with pre-built wheels. The project has successfully migrated from waf to Meson build system and integrated vcpkg for cross-platform dependency management.

Current State:

✅ Meson build system with vcpkg dependencies
✅ CI/CD with cibuildwheel for multi-platform wheels (Linux x64/ARM64, macOS Intel/Apple Silicon, Windows AMD64)
✅ Security hardening implemented (4 critical vulnerabilities fixed)
✅ Sanitizer testing infrastructure (AddressSanitizer + UndefinedBehaviorSanitizer)
✅ All 45 C tests passing

Key Metrics:

68 C source files, 56 header files (~17K lines)
80 Python files including 31 test files
~1,000 lines of Meson build configuration
5 Python versions supported (3.10-3.14)
5 platform/architecture combinations in CI

Priority 1: CI/CD Build Performance Optimization

Priority Level: CRITICAL

Estimated ROI: HIGH - Reduces developer iteration time and CI costs by 40-60%
Effort: 3-5 days
Impact: All contributors, every PR, every release

Problem Statement

The current CI/CD pipeline using cibuildwheel builds wheels for 5 platform/architecture combinations, with each build taking 15-25 minutes.

Current State - Already Optimized:

✅ macOS and Windows use actions/cache@v4 to cache vcpkg_installed/ directory
✅ Caching keys based on vcpkg.json and triplet files (smart invalidation)
✅ before-all runs once per job (not per Python version), dependencies built once
✅ Efficient matrix strategy for parallel builds

Current Build Times (estimated):

Linux x64: ~18 minutes (vcpkg: ~8 min on cache miss, wheel build: ~10 min)
Linux ARM64: ~22 minutes (vcpkg: ~12 min on cache miss, wheel build: ~10 min)
macOS x64: ~20 minutes (vcpkg: ~3-4 min on cache hit, wheel build: ~10 min)
macOS ARM64: ~18 minutes (vcpkg: ~2-3 min on cache hit, wheel build: ~10 min)
Windows AMD64: ~15 minutes (vcpkg: ~2-3 min on cache hit, wheel build: ~8 min)

Total CI time per PR: ~70-90 minutes for all platforms (with cache hits)

Remaining Pain Points:

Linux builds: vcpkg dependencies rebuild in manylinux Docker containers (no persistent cache across runs)
Limited opportunities: macOS/Windows already well-optimized with caching
Potential improvements: ccache/sccache for C/C++ compilation, workflow organization

Investigation Required

1.1 vcpkg Binary Caching Analysis

Current State (Already Implemented):

✅ macOS builds: actions/cache@v4 caches vcpkg_installed/ directory
✅ Windows builds: actions/cache@v4 caches vcpkg_installed/ directory
✅ Cache keys use hashFiles('vcpkg.json', 'vcpkg-triplets/*.cmake') for smart invalidation
✅ Linux builds: Dependencies rebuild in Docker (GitHub Actions cache doesn't persist in containers)

Note on vcpkg Binary Caching: The old x-gha binary source provider was deprecated in June 2024. The current implementation uses direct actions/cache for the vcpkg_installed directory, which is the recommended approach for GitHub Actions.

Remaining Optimization Opportunities:

Linux Docker caching: Explore Docker layer caching or bind mounts to persist vcpkg builds
Cache analysis: Measure actual cache hit rates on macOS/Windows
Alternative approaches:
- Pre-built dependency Docker images for Linux
- vcpkg's newer binary caching features (files, nuget providers)

Questions to Answer:

What's the actual cache hit rate on macOS/Windows in production?
Can we use Docker BuildKit caching for Linux builds?
Would pre-built dependency containers be worth the maintenance overhead?
What's the cache size and is it within GitHub's 10GB limit?

Investigation Steps:

# 1. Measure vcpkg build artifacts size
du -sh vcpkg_installed/x64-osx/
du -sh vcpkg_installed/arm64-osx/
du -sh vcpkg_installed/x64-windows-release/

# 2. Check cache hit rates in CI logs
# Look for "Cache restored from key:" messages in recent workflow runs

# 3. Test Docker BuildKit caching (Linux)
# Add --cache-from and --cache-to flags to docker build

1.2 Compiler Caching (ccache/sccache)

Goal: Cache C/C++ compilation artifacts across CI runs

Options:

ccache: Traditional, well-tested, local cache + GitHub Actions cache
sccache: Rust-based, supports cloud backends (S3, GCS, GitHub Actions cache)
Buildcache: Modern alternative with good Docker support

Investigation Steps:

# Example sccache integration in CI:
- name: Setup sccache
  uses: mozilla-actions/sccache-action@v0.0.4

- name: Configure environment
  run: |
    echo "CC=sccache gcc" >> $GITHUB_ENV
    echo "CXX=sccache g++" >> $GITHUB_ENV

Expected Impact: 30-50% faster C library compilation on cache hit

1.3 Dependency Pre-building Strategy

Concept: Build vcpkg dependencies once, cache, reuse across all wheel builds

Approach A: Separate Dependency Build Job

jobs:
  build-dependencies:
    strategy:
      matrix:
        include:
          - os: ubuntu-latest, triplet: x64-linux-pic
          - os: ubuntu-24.04-arm, triplet: arm64-linux-pic
          # ... etc
    steps:
      - name: Build and cache vcpkg dependencies
        run: vcpkg install --triplet=${{ matrix.triplet }}
      - name: Cache vcpkg_installed
        uses: actions/cache/save@v4
        with:
          path: vcpkg_installed
          key: vcpkg-${{ matrix.triplet }}-${{ hashFiles('vcpkg.json') }}
  
  build-wheels:
    needs: build-dependencies
    steps:
      - name: Restore vcpkg cache
        uses: actions/cache/restore@v4

Approach B: Use GitHub Container Registry for Pre-built Dependencies

Build Docker images with vcpkg dependencies pre-installed
Push to ghcr.io/LedFx/aubio-builder:x64-linux-pic
Use in cibuildwheel Linux builds

1.4 CI Workflow Optimization

Current Issues:

227 lines of YAML with duplication
Before-all scripts are repetitive across platforms
No job parallelization optimization
No conditional job skipping (e.g., skip builds if only docs changed)

Optimization Opportunities:

Use reusable workflows for common setup patterns
Matrix strategy improvements - reduce duplication
Path filters - skip unnecessary builds
Composite actions - extract common steps

Example Path Filter:

on:
  pull_request:
    paths-ignore:
      - 'doc/**'
      - '**.md'
      - '**.rst'

Implementation Guide

Step 1: Analyze Current Cache Performance

Before making changes, measure current state:

# 1. Check recent CI workflow runs for cache hit rates
# Look in GitHub Actions logs for messages like:
# "Cache restored from key: vcpkg-installed-macos-x64-..."

# 2. Measure vcpkg_installed directory sizes
du -sh vcpkg_installed/*/

# 3. Compare build times with/without cache
# Run a workflow with cleared cache vs. warm cache

Why: Understand baseline performance before optimization

Step 2: Add ccache Integration (If Needed)

For Linux builds:

[tool.cibuildwheel.linux.environment]
CC = "ccache /opt/rh/gcc-toolset-14/root/usr/bin/gcc"
CXX = "ccache /opt/rh/gcc-toolset-14/root/usr/bin/g++"
CCACHE_DIR = "/tmp/ccache"

[tool.cibuildwheel.linux]
before-all = """
    yum install -y ccache && \
    # ... existing vcpkg setup
"""

Note: The current caching strategy using actions/cache for vcpkg_installed is already the recommended approach. The deprecated x-gha binary source provider (removed June 2024) has been superseded by direct directory caching. ccache-linux-${{ matrix.arch }}-


#### Step 3: Optimize Linux Docker Builds (Advanced)

**Current Challenge:** Docker containers don't persist GitHub Actions cache

**Potential Solutions:**

**Option A: Docker BuildKit Caching**
```yaml
# In .github/workflows/build.yml
- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3

# Use BuildKit cache mounts in cibuildwheel
# (requires custom Docker image configuration)

Option B: Pre-built Dependency Container

# Create custom manylinux image with vcpkg dependencies pre-installed
FROM quay.io/pypa/manylinux_2_28_x86_64
RUN yum install -y git zip unzip tar curl make nasm
RUN git clone https://github.com/microsoft/vcpkg.git /opt/vcpkg
RUN cd /opt/vcpkg && ./bootstrap-vcpkg.sh
COPY vcpkg.json vcpkg-triplets/ /tmp/aubio-build/
RUN cd /tmp/aubio-build && /opt/vcpkg/vcpkg install --triplet=x64-linux-pic

Trade-off: Maintenance overhead vs. build speed improvement

Step 4: Add Path Filters

on:
  pull_request:
    paths:
      - 'src/**'
      - 'python/**'
      - 'tests/**'
      - 'meson.build'
      - 'meson_options.txt'
      - 'pyproject.toml'
      - 'vcpkg.json'
      - '.github/workflows/build.yml'
  push:
    branches: [main, develop]

Success Metrics

Current Baseline (With Existing Caching):

Total CI time: 70-90 minutes (with cache hits on macOS/Windows)
macOS vcpkg: ~2-4 minutes (cache hit)
Windows vcpkg: ~2-3 minutes (cache hit)
Linux vcpkg: ~8-12 minutes (rebuild each time)
Cache hit rate: ~70-80% (macOS/Windows)

After Additional Optimization (Target):

Total CI time: 50-65 minutes (20-25% improvement)
Linux vcpkg: ~4-6 minutes (with Docker caching, if implemented)
C library compilation: ~5-7 minutes (with ccache, 30% faster)
Cache hit rate: >85% (all platforms where applicable)

Note: The existing caching infrastructure is already quite effective. Further optimizations have diminishing returns and should be evaluated based on actual measured bottlenecks.

References

vcpkg Binary Caching: https://vcpkg.io/en/docs/users/binarycaching.html
GitHub Actions Cache: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
cibuildwheel caching: https://cibuildwheel.readthedocs.io/en/stable/faq/#caching
sccache: https://github.com/mozilla/sccache

Risks and Mitigations

Risk 1: Cache size exceeds GitHub's 10GB limit per repo

Mitigation: Monitor cache size, implement LRU eviction strategy, use vcpkg's NuGet backend if needed

Risk 2: Cache invalidation issues (stale dependencies)

Mitigation: Cache key includes hashFiles('vcpkg.json'), auto-invalidates on dependency changes

Risk 3: Binary cache corruption

Mitigation: vcpkg verifies checksums, fallback to source build on failure

Priority 2: Python Code Generation Modernization

Priority Level: HIGH

Estimated ROI: HIGH - Reduces maintenance burden, improves type safety
Effort: 4-6 days
Impact: Python package quality, developer experience

Problem Statement

The Python bindings use a custom code generation system (python/lib/gen_external.py, gen_code.py) that parses src/aubio.h and generates C extension code. While functional, this approach has several issues:

Current System:

352 lines of custom parser in gen_external.py
642 lines of code generation logic in gen_code.py
Generates 10+ C files (gen-onset.c, gen-pitch.c, etc.) at build time
Manual maintenance of object lists and templates
No type hints in generated Python bindings
Fragile parsing logic dependent on header file format

Pain Points:

Maintenance burden: Any change to C API requires updating generator
No IDE support: Generated Python code lacks type hints
Build complexity: Code generation adds build-time dependency
Limited extensibility: Hard to add new object types or methods
Python 3.12+ compatibility: No TypedDict, Protocol support
No docstrings: Generated code has minimal documentation

Investigation Required

2.1 Modern Python Binding Alternatives

Option A: Migrate to pybind11

Pros:
- Modern C++11 binding framework
- Automatic type conversion, docstrings
- Full Python 3.x support with type hints
- Excellent NumPy integration
- Active maintenance and community
Cons:
- Requires C++11 (aubio is C99)
- Significant migration effort
- All bindings need rewriting

Option B: Migrate to nanobind

Pros:
- Modern, lightweight (successor to pybind11)
- Better performance and smaller binaries
- Excellent type hint support
Cons:
- Newer project (less mature)
- Similar C++ requirement
- Migration effort

Option C: Use CFFI

Pros:
- Pure Python, no C++ required
- Excellent for C libraries
- Runtime and build-time modes
- Good NumPy integration
Cons:
- Less ergonomic than pybind11
- Manual type definitions
- Performance overhead (mitigated by ABI mode)

Option D: Improve Current Generator

Pros:
- No migration required
- Incremental improvements
- Keep current build system
Cons:
- Maintenance burden remains
- Limited by custom parser approach

2.2 Type Hints and Stub Generation

Current State: No .pyi stub files, no runtime type hints

Options:

Generate stubs with stubgen (mypy tool)
```
stubgen -p aubio -o stubs/
```
Use pybind11-stubgen (if migrating to pybind11)
Manual stub creation for high-value APIs

Benefits:

IDE autocomplete and type checking
Better documentation
mypy/pyright support

2.3 NumPy 2.0 Optimization

Current Issue: aubio uses older NumPy C API

Opportunities:

Use NumPy 2.0 C API for better performance
Leverage new array protocols
Consider using nanobind's NumPy integration (automatic array wrapping)

Implementation Guide

Step 1: Generate Type Stubs

Create script: scripts/generate_stubs.py

#!/usr/bin/env python3
"""Generate type stubs for aubio package."""
import subprocess
import sys

def main():
    # Use mypy stubgen to create stubs
    subprocess.run([
        sys.executable, "-m", "mypy.stubgen",
        "-p", "aubio",
        "-o", "python/aubio-stubs"
    ], check=True)
    
    # Post-process stubs to add missing information
    # (e.g., NumPy array types)

if __name__ == "__main__":
    main()

Add to pyproject.toml:

[project]
...
[project.optional-dependencies]
stubs = ["mypy"]

[tool.meson-python]
# Include stub files in wheel

Step 2: Enhance Generator with Docstrings

Modify gen_code.py:

def generate_docstring(self, obj_name: str, method_name: str) -> str:
    """Extract docstring from Doxygen comments in header."""
    # Parse src/aubio.h for /** ... */ comments
    # Convert to Python docstring format
    return f'''"""
    {obj_name}.{method_name}
    
    [Generated from C API documentation]
    """'''

Step 3: pybind11 Proof of Concept

Create: python/ext/pybind11_poc.cpp

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include "aubio.h"

namespace py = pybind11;

PYBIND11_MODULE(_aubio_pybind11, m) {
    // Proof of concept: fvec_t wrapper
    py::class_<fvec_t>(m, "fvec")
        .def(py::init([](size_t length) {
            return new_fvec(length);
        }), "Create new float vector")
        .def_property_readonly("length", 
            [](fvec_t* fv) { return fv->length; })
        .def("__getitem__", 
            [](fvec_t* fv, size_t i) {
                if (i >= fv->length) throw py::index_error();
                return fv->data[i];
            })
        .def("__setitem__",
            [](fvec_t* fv, size_t i, smpl_t val) {
                if (i >= fv->length) throw py::index_error();
                fv->data[i] = val;
            });
    // Add more bindings...
}

Integrate with Meson:

# python/meson.build
if get_option('use_pybind11')
  pybind11_dep = dependency('pybind11')
  py.extension_module('_aubio_pybind11',
    'ext/pybind11_poc.cpp',
    dependencies: [aubio_dep, pybind11_dep, numpy_dep],
    install: true
  )
endif

Success Metrics

Before:

No type hints or stubs
IDE support: Poor
Code generation time: ~5-10 seconds
Maintenance effort: HIGH (custom parser)
Type safety: None

After (Stubs Only):

Full .pyi stubs for all APIs
IDE support: Good
mypy/pyright compatibility: Yes
Maintenance effort: MEDIUM (stubs need updating)

After (pybind11 Migration):

Native type hints
IDE support: Excellent
Code generation time: 0 (pure C++)
Maintenance effort: LOW (automatic from C++ declarations)
Type safety: Full
Performance: Same or better

References

pybind11: https://pybind11.readthedocs.io/
nanobind: https://nanobind.readthedocs.io/
CFFI: https://cffi.readthedocs.io/
NumPy 2.0 migration: https://numpy.org/devdocs/numpy_2_0_migration_guide.html
PEP 561 (Stub files): https://peps.python.org/pep-0561/

Priority 3: Test Infrastructure Enhancement

Priority Level: HIGH

Estimated ROI: HIGH - Prevents regressions, improves code quality
Effort: 5-7 days
Impact: Code reliability, contributor confidence

Problem Statement

While the project has good test coverage (53 C tests, 31 Python tests), there are significant gaps in testing infrastructure:

Current State:

✅ 45/45 C unit tests passing
✅ Sanitizer testing (ASAN, UBSAN) via GitHub Actions
✅ Python test suite with pytest
❌ No performance/benchmark tests
❌ No fuzz testing (security concern for audio processing)
❌ Limited boundary condition testing
❌ No integration tests for real audio files
❌ Test suite disabled in CI (|| true - tests allowed to fail)

Pain Points:

CI ignores test failures: Tests run but failures don't block PRs
No regression detection: Can't detect performance regressions
Limited edge case coverage: Boundary conditions not systematically tested
No fuzz testing: Audio processing is vulnerable to malformed input
Platform-specific issues: Tests don't cover all platform code paths
Manual testing required: Audio quality verification is manual

Investigation Required

3.1 Test Failure Root Cause Analysis

Current Issue: Tests run with || true in CI, masking failures

Investigation:

# Run tests locally on each platform
meson setup builddir -Dtests=true
meson test -C builddir --print-errorlogs

# Identify which tests fail and why:
# - Missing test data files?
# - Platform-specific issues?
# - Actual bugs?
# - Timing/flakiness?

Questions:

Which specific tests are failing?
Are failures consistent or flaky?
Are they platform-specific?
Do we have all required test data files?

3.2 Benchmark Infrastructure Options

Goal: Detect performance regressions in audio processing

Option A: Custom Benchmark Suite

Create benchmarks/ directory
Benchmark key operations (FFT, onset detection, pitch tracking)
Store baseline results, compare on PR

Option B: Google Benchmark

Industry-standard C++ benchmarking framework
Statistical analysis, outlier detection
JSON output for tracking over time

Example:

// benchmarks/bench_fft.c
#include <benchmark/benchmark.h>
#include "aubio.h"

static void BM_FFT_512(benchmark::State& state) {
  aubio_fft_t* fft = new_aubio_fft(512);
  fvec_t* input = new_fvec(512);
  cvec_t* output = new_cvec(512);
  
  for (auto _ : state) {
    aubio_fft_do(fft, input, output);
  }
  
  del_aubio_fft(fft);
  del_fvec(input);
  del_cvec(output);
}
BENCHMARK(BM_FFT_512);

3.3 Fuzz Testing Strategy

Why Fuzz Testing for Audio?

Malformed audio files can trigger buffer overflows
Unexpected sample rates, bit depths, channel counts
Real-world security concern (see CVE-2018-14523, CVE-2018-19800)

Option A: libFuzzer (LLVM)

Compile-time instrumentation
Fast, efficient
Good GitHub Actions integration

Option B: AFL++ (American Fuzzy Lop)

Well-tested, widely used
Slower but thorough
Can find deep bugs

Option C: OSS-Fuzz Integration

Google's continuous fuzzing service
Free for open source
Automatic bug reporting

Harness Example:

// fuzz/fuzz_onset.c
#include <stdint.h>
#include <stddef.h>
#include "aubio.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  if (size < 16) return 0;
  
  // Extract parameters from fuzz input
  uint_t win_s = 512;
  uint_t hop_s = 256;
  
  // Create onset detector
  aubio_onset_t* o = new_aubio_onset("default", win_s, hop_s, 44100);
  if (!o) return 0;
  
  fvec_t* in = new_fvec(hop_s);
  fvec_t* out = new_fvec(1);
  
  // Fill input with fuzz data
  for (uint_t i = 0; i < hop_s && i < size; i++) {
    in->data[i] = (smpl_t)data[i] / 128.0 - 1.0;
  }
  
  // Run onset detection (should not crash)
  aubio_onset_do(o, in, out);
  
  del_aubio_onset(o);
  del_fvec(in);
  del_fvec(out);
  return 0;
}

3.4 Integration Test Suite

Goal: Test real-world audio processing workflows

Test Scenarios:

Load audio file → detect onsets → verify count
Load audio file → extract pitch → verify frequency range
Load audio file → detect tempo → verify BPM
Process audio with different sample rates (8kHz, 44.1kHz, 96kHz)
Handle edge cases (empty files, very short files, very long files)

Test Data:

Create minimal test audio files (synthetic)
Use known reference files with expected outputs
Store in tests/data/ directory

Implementation Guide

Step 1: Enable Test Failures in CI

Modify: .github/workflows/build.yml

# BEFORE:
test-command = "... && pytest {project}/python/tests || true"

# AFTER:
test-command = "... && pytest {project}/python/tests"

# Add separate test job for better visibility:
test-c-library:
  name: Test C library
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Setup and build
      run: |
        pip install meson ninja numpy
        meson setup builddir -Dtests=true
        meson compile -C builddir
    - name: Run tests
      run: meson test -C builddir --print-errorlogs

Step 2: Add Google Benchmark

Add to vcpkg.json:

{
  "dependencies": [
    ...
    {
      "name": "benchmark",
      "platform": "!windows"  // Optional: only for development
    }
  ]
}

Create: benchmarks/meson.build

benchmark_dep = dependency('benchmark', required: false)

if benchmark_dep.found()
  bench_fft = executable('bench_fft',
    'bench_fft.c',
    dependencies: [aubio_dep, benchmark_dep],
  )
  
  benchmark('FFT Performance', bench_fft)
endif

Step 3: Set up libFuzzer

Create: .github/workflows/fuzz.yml

name: Fuzz Testing
on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly
  workflow_dispatch:

jobs:
  fuzz:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build with fuzzing
        run: |
          export CC=clang
          export CFLAGS="-fsanitize=fuzzer,address -g"
          meson setup builddir
          meson compile -C builddir
      
      - name: Run fuzzers (5 minutes each)
        run: |
          for fuzzer in builddir/fuzz/fuzz_*; do
            timeout 300 $fuzzer fuzz/corpus/ || true
          done
      
      - name: Upload crashes
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: fuzz-crashes
          path: crash-*

Create: fuzz/meson.build

if get_option('fuzzing')
  fuzzer_flags = ['-fsanitize=fuzzer,address']
  
  fuzz_onset = executable('fuzz_onset',
    'fuzz_onset.c',
    c_args: fuzzer_flags,
    link_args: fuzzer_flags,
    dependencies: aubio_dep
  )
  
  # Add more fuzz targets...
endif

Step 4: Integration Tests

Create: python/tests/test_integration.py

"""Integration tests with real audio processing workflows."""
import pytest
import aubio
import numpy as np

def generate_sine_wave(freq=440, duration=1.0, sr=44100):
    """Generate synthetic sine wave for testing."""
    t = np.linspace(0, duration, int(sr * duration))
    return np.sin(2 * np.pi * freq * t).astype(np.float32)

def test_onset_detection_workflow():
    """Test complete onset detection workflow."""
    # Generate audio with sharp attack
    signal = generate_sine_wave(440, 1.0)
    signal[:100] *= np.linspace(0, 1, 100)  # Add attack
    
    # Detect onsets
    onset = aubio.onset("default", 512, 256, 44100)
    onsets = []
    
    for frame in signal.reshape(-1, 256):
        fvec = aubio.fvec(256)
        fvec[:] = frame
        if onset(fvec):
            onsets.append(onset.get_last())
    
    # Verify at least one onset detected at start
    assert len(onsets) >= 1
    assert onsets[0] < 1000  # Within first 1000 samples

def test_pitch_detection_workflow():
    """Test complete pitch detection workflow."""
    signal = generate_sine_wave(440, 1.0)
    
    pitch_o = aubio.pitch("default", 2048, 512, 44100)
    pitch_o.set_unit("Hz")
    
    pitches = []
    for frame in signal.reshape(-1, 512):
        fvec = aubio.fvec(512)
        fvec[:] = frame
        detected = pitch_o(fvec)[0]
        if detected > 0:
            pitches.append(detected)
    
    # Verify average detected pitch is close to 440 Hz
    avg_pitch = np.mean(pitches)
    assert 430 < avg_pitch < 450, f"Expected ~440 Hz, got {avg_pitch}"

Success Metrics

Before:

C tests: 45/45 passing (but CI ignores failures)
Python tests: Run with || true
Performance tracking: None
Fuzz testing: None
Integration tests: None
CI enforcement: Weak

After:

C tests: All passing, CI blocks on failure
Python tests: All passing, CI blocks on failure
Performance tracking: Automated benchmarks on every PR
Fuzz testing: Continuous fuzzing in CI, OSS-Fuzz integration
Integration tests: 10+ real-world scenarios
CI enforcement: Strong (tests are required checks)
Coverage: >80% (measured with gcov/lcov)

References

Google Benchmark: https://github.com/google/benchmark
libFuzzer: https://llvm.org/docs/LibFuzzer.html
OSS-Fuzz: https://google.github.io/oss-fuzz/
pytest best practices: https://docs.pytest.org/en/stable/goodpractices.html

Priority 4: Code Quality and Static Analysis

Priority Level: MEDIUM

Estimated ROI: MEDIUM - Prevents bugs, improves maintainability
Effort: 3-4 days
Impact: Code quality, security, maintainability

Problem Statement

The codebase has basic security hardening but lacks comprehensive static analysis and code quality tools:

Current State:

✅ Security compiler flags (-fstack-protector-strong, -D_FORTIFY_SOURCE=2)
✅ CodeQL scanning enabled
✅ Sanitizer testing (ASAN, UBSAN)
❌ No clang-tidy integration
❌ No cppcheck or other static analyzers
❌ No code coverage tracking
❌ No complexity metrics
❌ Limited compiler warning coverage

Pain Points:

No code coverage metrics: Can't track test coverage improvements
Manual code review burden: No automated checks for common patterns
Inconsistent code style: No formatter or style checker
Missing best practices: No linting for security patterns
Technical debt invisible: No complexity or maintainability metrics

Investigation Required

4.1 Static Analysis Tool Selection

Option A: Clang-Tidy

Part of LLVM toolchain
C/C++ focused
Configurable checks
Good Meson integration

Checks to Enable:

clang-analyzer-* - Core static analysis
bugprone-* - Bug-prone patterns
cppcoreguidelines-* - C++ Core Guidelines
readability-* - Code readability
performance-* - Performance issues
cert-* - CERT secure coding rules

Option B: Cppcheck

Focused on C/C++
Zero false-positive goal
Lightweight
Good for CI

Option C: PVS-Studio

Commercial (free for open source)
Very thorough
Low false-positive rate
Requires registration

Recommendation: Start with Clang-Tidy + Cppcheck (both free, complementary)

4.2 Code Coverage Strategy

Tools:

gcov/lcov - Traditional, well-supported
llvm-cov - Modern, better integration with Clang
Codecov.io - Online dashboard, PR comments
Coveralls - Alternative to Codecov

Integration Points:

Compile with --coverage flags
Run test suite
Generate coverage reports
Upload to Codecov
Enforce minimum coverage in CI

Target Coverage:

Overall: >80%
New code: >90%
Critical paths (audio I/O, DSP): >95%

4.3 Code Formatting and Style

Options:

clang-format: Industry standard for C/C++
uncrustify: Highly configurable
astyle: Simple, legacy

Current Issue: No consistent style enforcement

Solution:

Add .clang-format configuration
Add pre-commit hook
Add CI check
Run once to reformat codebase (separate PR)

4.4 Complexity Metrics

Metrics to Track:

Cyclomatic complexity (functions >15 are suspect)
Function length (>100 lines is concerning)
Nesting depth (>4 levels is hard to understand)
Maintainability index

Tools:

lizard - Multi-language code complexity analyzer
sloccount - Line count and effort estimation
SonarQube - Comprehensive code quality platform

Implementation Guide

Step 1: Clang-Tidy Configuration

Create: .clang-tidy

---
Checks: >
  clang-analyzer-*,
  bugprone-*,
  cert-*,
  readability-*,
  performance-*,
  -readability-magic-numbers,
  -cert-err33-c,
  -clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling

WarningsAsErrors: ''  # Start permissive, tighten later

CheckOptions:
  - key: readability-identifier-naming.FunctionCase
    value: lower_case
  - key: readability-identifier-naming.VariableCase
    value: lower_case
  - key: readability-identifier-naming.ConstantCase
    value: UPPER_CASE

Add to Meson:

# meson.build
clang_tidy = find_program('clang-tidy', required: false)
if clang_tidy.found() and get_option('clang_tidy')
  run_target('clang-tidy',
    command: [clang_tidy, '-p', meson.build_root()] + aubio_sources
  )
endif

Add CI Job:

static-analysis:
  name: Static Analysis (clang-tidy)
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install tools
      run: sudo apt-get install -y clang-tidy
    - name: Setup build
      run: meson setup builddir
    - name: Run clang-tidy
      run: |
        ninja -C builddir clang-tidy 2>&1 | tee clang-tidy.log
        # Fail if errors found (not warnings)
        ! grep "error:" clang-tidy.log

Step 2: Code Coverage with Codecov

Modify: meson.build

if get_option('b_coverage')
  add_project_arguments('-fprofile-arcs', '-ftest-coverage', language: 'c')
  add_project_link_arguments('-lgcov', language: 'c')
endif

Add CI Job:

coverage:
  name: Code Coverage
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install dependencies
      run: |
        sudo apt-get install -y lcov
        pip install meson ninja numpy
    - name: Build with coverage
      run: |
        meson setup builddir -Db_coverage=true -Dtests=true
        meson compile -C builddir
    - name: Run tests
      run: meson test -C builddir
    - name: Generate coverage
      run: ninja -C builddir coverage-html
    - name: Upload to Codecov
      uses: codecov/codecov-action@v4
      with:
        files: builddir/meson-logs/coverage.xml
        fail_ci_if_error: true

Add badge to README:

[![Coverage](https://codecov.io/gh/LedFx/aubio-ledfx/branch/main/graph/badge.svg)](https://codecov.io/gh/LedFx/aubio-ledfx)

Step 3: Clang-Format

Create: .clang-format

---
BasedOnStyle: LLVM
IndentWidth: 2
ColumnLimit: 80
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: Never
BreakBeforeBraces: Linux
IndentCaseLabels: false
PointerAlignment: Left
SpaceAfterCStyleCast: true

Add pre-commit hook:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/mirrors-clang-format
    rev: v18.1.0
    hooks:
      - id: clang-format
        types_or: [c, c++]

Add CI check:

format-check:
  name: Code Formatting
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Check formatting
      uses: jidicula/clang-format-action@v4.11.0
      with:
        clang-format-version: '18'
        check-path: 'src'

Success Metrics

Before:

Static analysis: CodeQL only
Code coverage: Unknown
Code formatting: Inconsistent
CI checks: Basic
Technical debt: Unknown

After:

Static analysis: Clang-tidy + Cppcheck + CodeQL
Code coverage: >80%, tracked via Codecov
Code formatting: Consistent, enforced by CI
CI checks: Comprehensive (build, test, lint, format, coverage)
Technical debt: Visible and tracked

References

Clang-Tidy: https://clang.llvm.org/extra/clang-tidy/
Codecov: https://docs.codecov.com/docs
clang-format: https://clang.llvm.org/docs/ClangFormat.html

Priority 5: Documentation and Developer Experience

Priority Level: MEDIUM

Estimated ROI: MEDIUM - Improves contributor onboarding
Effort: 3-5 days
Impact: Contributor experience, project sustainability

Problem Statement

The project has good README and basic documentation, but lacks comprehensive developer guides and modern documentation infrastructure:

Current State:

✅ Good README with build instructions
✅ Sphinx documentation for Python API
✅ Doxygen documentation for C API
❌ No contributor guide
❌ No architecture documentation
❌ No debugging guide
❌ Build documentation scattered
❌ No API design rationale

Pain Points:

Onboarding difficulty: New contributors struggle to understand codebase
Scattered documentation: Build, testing, vcpkg info in multiple places
No debugging guide: Hard to troubleshoot build/test issues
API documentation incomplete: Many functions lack detailed docs
No examples: Limited code examples for common tasks

Investigation Required

5.1 Documentation Structure Analysis

Current Documentation:

README.md - 215 lines, build instructions
doc/ - 40+ RST files for Sphinx
python/README.md - Python-specific info
Various *.md files - Security, implementation plans

Gaps:

No CONTRIBUTING.md
No ARCHITECTURE.md
No DEBUGGING.md
No RELEASE.md
No examples directory with tutorials

5.2 API Documentation Quality

Sample Analysis of Current State:

// Example from src/pitch/pitch.h
aubio_pitch_t * new_aubio_pitch (const char_t * method,
    uint_t buf_size, uint_t hop_size, uint_t samplerate);

// Has basic comment, but missing:
// - List of available methods
// - Valid ranges for parameters
// - Return value details (NULL on error?)
// - Example usage
// - Performance characteristics

Improvement Areas:

Add parameter validation documentation
Document error conditions
Add usage examples
Cross-reference related functions

5.3 Interactive Examples

Goal: Lower barrier to entry with runnable examples

Options:

Jupyter notebooks - Interactive Python examples
Example programs - C examples in examples/
Online playground - Web-based demo (ambitious)

Priority Examples to Create:

Basic onset detection
Pitch tracking
Tempo detection
Audio file processing workflow
Real-time audio processing

Implementation Guide

Step 1: Create CONTRIBUTING.md

Create: CONTRIBUTING.md

# Contributing to aubio-ledfx

Thank you for your interest in contributing to aubio-ledfx!

## Quick Start

1. **Fork and clone:**
   ```bash
   git clone https://github.com/YOUR_USERNAME/aubio-ledfx.git
   cd aubio-ledfx

Set up development environment:

# Install dependencies
pip install meson ninja numpy pytest

# Configure and build
meson setup builddir -Dtests=true -Dexamples=true
meson compile -C builddir

# Run tests
meson test -C builddir

Make changes and test:

# Edit code
vim src/...

# Rebuild
meson compile -C builddir

# Test
meson test -C builddir

# Run specific test
./builddir/tests/test-onset

Submit PR:
- Create feature branch
- Make atomic commits
- Write tests
- Update documentation
- Submit PR with description

Code Style

C code: Follow existing style (clang-format enforced)
Python code: PEP 8 (black formatter)
Commit messages: Conventional Commits format

Testing Requirements

All new code must have tests
C tests in tests/src/
Python tests in python/tests/
Run sanitizers: meson setup builddir -Db_sanitize=address,undefined

Documentation

Update relevant .rst files in doc/
Add docstrings to Python code
Add Doxygen comments to C functions

Review Process

Automated checks must pass (CI, tests, linting)
Code review by maintainer
Merge when approved

Getting Help

Open an issue for questions
Check existing documentation in doc/
See DEBUGGING.md for troubleshooting


#### Step 2: Create ARCHITECTURE.md

**Create:** `ARCHITECTURE.md`
```markdown
# aubio-ledfx Architecture

## Overview

aubio-ledfx is a C library with Python bindings for audio analysis.

## Project Structure

aubio-ledfx/ ├── src/ # C library source │ ├── aubio.h # Main public API header │ ├── aubio_priv.h # Private/internal header │ ├── mathutils.c # Math utilities │ ├── fvec.c # Float vector operations │ ├── cvec.c # Complex vector operations │ ├── spectral/ # Spectral analysis │ ├── pitch/ # Pitch detection algorithms │ ├── tempo/ # Tempo and beat tracking │ ├── onset/ # Onset detection │ └── io/ # Audio I/O (file, device) ├── python/ # Python bindings │ ├── ext/ # C extension module │ ├── lib/ # Pure Python code │ └── tests/ # Python test suite ├── tests/ # C test suite ├── examples/ # Example programs └── doc/ # Documentation (Sphinx + Doxygen)


## Core Concepts

### Data Types

- **fvec_t:** Float vector (real-valued signals)
- **cvec_t:** Complex vector (frequency domain)
- **fmat_t:** Float matrix (multi-channel)

### Processing Pipeline

Audio File → Source → fvec_t → Analysis → Results ↓ FFT/PVOC → cvec_t → Spectral Analysis


### Object-Oriented C Pattern

```c
// Creation
aubio_onset_t* o = new_aubio_onset("default", 512, 256, 44100);

// Processing
aubio_onset_do(o, input_fvec, output_fvec);

// Destruction
del_aubio_onset(o);

Build System

Meson: Build configuration
vcpkg: Dependency management
meson-python: Python package build

See doc/meson_reference.rst for details.

FFT Backends (Priority Order)

fftw3f (recommended)
Accelerate (macOS)
Intel IPP (optional)
ooura (fallback, always available)

Thread Safety

aubio is NOT thread-safe. Each thread must have its own objects.

Memory Management

All new_* functions allocate, return pointer or NULL on failure
All del_* functions free, safe to call with NULL
No garbage collection, manual management required


#### Step 3: API Documentation Template

**Create documentation script:**
```python
#!/usr/bin/env python3
"""Audit API documentation completeness."""

import re
from pathlib import Path

def check_function_doc(filepath, function_name, comment_block):
    """Check if function has complete documentation."""
    issues = []
    
    # Check for parameter documentation
    if '@param' not in comment_block:
        issues.append("Missing @param documentation")
    
    # Check for return documentation
    if '@return' not in comment_block:
        issues.append("Missing @return documentation")
    
    # Check for example
    if '@example' not in comment_block and 'new_' in function_name:
        issues.append("Constructor missing usage example")
    
    return issues

# Run on all headers...

Step 4: Jupyter Notebook Examples

Create: examples/notebooks/01_onset_detection.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Onset Detection with aubio\n",
    "\n",
    "This notebook demonstrates onset detection using aubio-ledfx."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import aubio\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Generate test signal\n",
    "samplerate = 44100\n",
    "duration = 1.0\n",
    "\n",
    "# Create onset detector\n",
    "win_s = 512\n",
    "hop_s = 256\n",
    "onset = aubio.onset(\"default\", win_s, hop_s, samplerate)\n",
    "\n",
    "# Process audio...\n",
    "# (full example)"
   ]
  }
 ]
}

Success Metrics

Before:

Contributor onboarding: 2-3 days
Documentation coverage: 40%
Code examples: Minimal
API reference: Incomplete

After:

Contributor onboarding: <4 hours
Documentation coverage: >80%
Code examples: 10+ working examples
API reference: Complete with examples
Jupyter notebooks: 5+ tutorials

Implementation Roadmap

Month 1: Quick Wins

Week 1: CI/CD optimization (Priority 1, Phase 1)
Week 2: Test infrastructure fixes (Priority 3, Phase 1)
Week 3: Static analysis integration (Priority 4, Phase 1)
Week 4: Documentation essentials (Priority 5, Phase 1)

Month 2: Deep Work

Week 5-6: CI/CD dependency optimization (Priority 1, Phase 2)
Week 7: Test benchmarking and fuzzing (Priority 3, Phases 2-3)
Week 8: Python bindings analysis (Priority 2, Phases 1-3)

Month 3: Polish and Iterate

Week 9-10: Python binding improvements or migration (Priority 2, Phase 4)
Week 11: Code coverage and quality metrics (Priority 4, Phases 2-3)
Week 12: Documentation and examples (Priority 5, Phases 2-3)

Ongoing

Monitor CI performance metrics
Track code coverage trends
Review static analysis findings
Update documentation as code evolves

Success Criteria

Key Performance Indicators (KPIs)

Metric	Baseline	Target	Timeframe
CI build time	90-120 min	35-50 min	Month 1
Test pass rate	Unknown (ignored)	100%	Month 1
Code coverage	Unknown	>80%	Month 2
Static analysis issues	Unknown	<10 high	Month 1
Documentation coverage	~40%	>80%	Month 3
Contributor onboarding	2-3 days	<4 hours	Month 3
Fuzz testing	None	Continuous	Month 2
Performance benchmarks	None	Tracked	Month 2

Quality Gates

Before merging PRs:

✅ All tests pass
✅ Code coverage ≥ baseline (no regression)
✅ Static analysis passes
✅ Code formatted with clang-format
✅ Documentation updated

Risk Assessment

High Risk Items

Python binding migration (if pursued)
- Risk: Breaking changes for users
- Mitigation: Gradual migration, maintain compatibility layer
Test failures in CI
- Risk: Unknown failures when removing || true
- Mitigation: Fix all tests locally first
Performance regression from optimization
- Risk: CI optimizations slow down actual builds
- Mitigation: Benchmark at each step

Medium Risk Items

vcpkg cache size
- Risk: Exceeding GitHub's 10GB cache limit
- Mitigation: Monitor, implement LRU eviction
Fuzz testing resource usage
- Risk: Fuzzing consumes too much CI time/resources
- Mitigation: Run on schedule, not every PR

Conclusion

These 5 priorities represent the highest-value optimizations for aubio-ledfx:

CI/CD Performance - Immediate developer productivity impact
Python Modernization - Long-term maintainability and user experience
Test Infrastructure - Code quality and confidence
Static Analysis - Proactive bug prevention
Documentation - Contributor growth and project sustainability

Next Steps:

Review this roadmap with maintainers
Prioritize based on team capacity
Create GitHub issues for each priority
Begin implementation in priority order
Track progress with KPIs

Estimated Total Effort: 18-27 days (spread over 3 months) Estimated ROI: 3-5x in reduced maintenance burden and faster iteration

Document Maintenance:

Review quarterly
Update based on progress
Add new priorities as they emerge
Archive completed items

Last Updated: 2025-11-14

FilesExpand file tree

OPTIMIZATION_ROADMAP.md

Latest commit

History

OPTIMIZATION_ROADMAP.md

File metadata and controls

aubio-ledfx Optimization and Modernization Roadmap

Executive Summary

Priority 1: CI/CD Build Performance Optimization

Priority Level: CRITICAL

Problem Statement

Investigation Required

1.1 vcpkg Binary Caching Analysis

1.2 Compiler Caching (ccache/sccache)

1.3 Dependency Pre-building Strategy

1.4 CI Workflow Optimization

Recommended Solution Strategy

Implementation Guide

Step 1: Analyze Current Cache Performance

Step 2: Add ccache Integration (If Needed)

Step 4: Add Path Filters

Success Metrics

References

Risks and Mitigations

Priority 2: Python Code Generation Modernization

Priority Level: HIGH

Problem Statement

Investigation Required

2.1 Modern Python Binding Alternatives

2.2 Type Hints and Stub Generation

2.3 NumPy 2.0 Optimization

Recommended Solution Strategy

Implementation Guide

Step 1: Generate Type Stubs

Step 2: Enhance Generator with Docstrings

Step 3: pybind11 Proof of Concept

Success Metrics

References

Priority 3: Test Infrastructure Enhancement

Priority Level: HIGH

Problem Statement

Investigation Required

3.1 Test Failure Root Cause Analysis

3.2 Benchmark Infrastructure Options

3.3 Fuzz Testing Strategy

3.4 Integration Test Suite

Recommended Solution Strategy

Implementation Guide

Step 1: Enable Test Failures in CI

Step 2: Add Google Benchmark

Step 3: Set up libFuzzer

Step 4: Integration Tests

Success Metrics

References

Priority 4: Code Quality and Static Analysis

Priority Level: MEDIUM

Problem Statement

Investigation Required

4.1 Static Analysis Tool Selection

4.2 Code Coverage Strategy

4.3 Code Formatting and Style

4.4 Complexity Metrics

Recommended Solution Strategy

Implementation Guide

Step 1: Clang-Tidy Configuration

Step 2: Code Coverage with Codecov

Step 3: Clang-Format

Success Metrics

References

Priority 5: Documentation and Developer Experience

Priority Level: MEDIUM

Problem Statement

Investigation Required

5.1 Documentation Structure Analysis

5.2 API Documentation Quality

5.3 Interactive Examples

Recommended Solution Strategy

Implementation Guide

Step 1: Create CONTRIBUTING.md

Code Style