Skip to content

Latest commit

 

History

History
1743 lines (1375 loc) · 48.2 KB

File metadata and controls

1743 lines (1375 loc) · 48.2 KB

aubio-ledfx Optimization and Modernization Roadmap

Document Version: 1.0
Created: 2025-11-14
Project: aubio-ledfx maintained fork
Purpose: Comprehensive optimization and modernization strategy


Executive Summary

This document presents the top 5 highest priority optimization and modernization work items for aubio-ledfx, a maintained fork providing Python 3.8-3.13 support with pre-built wheels. The project has successfully migrated from waf to Meson build system and integrated vcpkg for cross-platform dependency management.

Current State:

  • ✅ Meson build system with vcpkg dependencies
  • ✅ CI/CD with cibuildwheel for multi-platform wheels (Linux x64/ARM64, macOS Intel/Apple Silicon, Windows AMD64)
  • ✅ Security hardening implemented (4 critical vulnerabilities fixed)
  • ✅ Sanitizer testing infrastructure (AddressSanitizer + UndefinedBehaviorSanitizer)
  • ✅ All 45 C tests passing

Key Metrics:

  • 68 C source files, 56 header files (~17K lines)
  • 80 Python files including 31 test files
  • ~1,000 lines of Meson build configuration
  • 5 Python versions supported (3.10-3.14)
  • 5 platform/architecture combinations in CI

Priority 1: CI/CD Build Performance Optimization

Priority Level: CRITICAL

Estimated ROI: HIGH - Reduces developer iteration time and CI costs by 40-60%
Effort: 3-5 days
Impact: All contributors, every PR, every release

Problem Statement

The current CI/CD pipeline using cibuildwheel builds wheels for 5 platform/architecture combinations, with each build taking 15-25 minutes.

Current State - Already Optimized:

  • ✅ macOS and Windows use actions/cache@v4 to cache vcpkg_installed/ directory
  • ✅ Caching keys based on vcpkg.json and triplet files (smart invalidation)
  • before-all runs once per job (not per Python version), dependencies built once
  • ✅ Efficient matrix strategy for parallel builds

Current Build Times (estimated):

  • Linux x64: ~18 minutes (vcpkg: ~8 min on cache miss, wheel build: ~10 min)
  • Linux ARM64: ~22 minutes (vcpkg: ~12 min on cache miss, wheel build: ~10 min)
  • macOS x64: ~20 minutes (vcpkg: ~3-4 min on cache hit, wheel build: ~10 min)
  • macOS ARM64: ~18 minutes (vcpkg: ~2-3 min on cache hit, wheel build: ~10 min)
  • Windows AMD64: ~15 minutes (vcpkg: ~2-3 min on cache hit, wheel build: ~8 min)

Total CI time per PR: ~70-90 minutes for all platforms (with cache hits)

Remaining Pain Points:

  1. Linux builds: vcpkg dependencies rebuild in manylinux Docker containers (no persistent cache across runs)
  2. Limited opportunities: macOS/Windows already well-optimized with caching
  3. Potential improvements: ccache/sccache for C/C++ compilation, workflow organization

Investigation Required

1.1 vcpkg Binary Caching Analysis

Current State (Already Implemented):

  • ✅ macOS builds: actions/cache@v4 caches vcpkg_installed/ directory
  • ✅ Windows builds: actions/cache@v4 caches vcpkg_installed/ directory
  • ✅ Cache keys use hashFiles('vcpkg.json', 'vcpkg-triplets/*.cmake') for smart invalidation
  • ✅ Linux builds: Dependencies rebuild in Docker (GitHub Actions cache doesn't persist in containers)

Note on vcpkg Binary Caching: The old x-gha binary source provider was deprecated in June 2024. The current implementation uses direct actions/cache for the vcpkg_installed directory, which is the recommended approach for GitHub Actions.

Remaining Optimization Opportunities:

  1. Linux Docker caching: Explore Docker layer caching or bind mounts to persist vcpkg builds
  2. Cache analysis: Measure actual cache hit rates on macOS/Windows
  3. Alternative approaches:
    • Pre-built dependency Docker images for Linux
    • vcpkg's newer binary caching features (files, nuget providers)

Questions to Answer:

  • What's the actual cache hit rate on macOS/Windows in production?
  • Can we use Docker BuildKit caching for Linux builds?
  • Would pre-built dependency containers be worth the maintenance overhead?
  • What's the cache size and is it within GitHub's 10GB limit?

Investigation Steps:

# 1. Measure vcpkg build artifacts size
du -sh vcpkg_installed/x64-osx/
du -sh vcpkg_installed/arm64-osx/
du -sh vcpkg_installed/x64-windows-release/

# 2. Check cache hit rates in CI logs
# Look for "Cache restored from key:" messages in recent workflow runs

# 3. Test Docker BuildKit caching (Linux)
# Add --cache-from and --cache-to flags to docker build

1.2 Compiler Caching (ccache/sccache)

Goal: Cache C/C++ compilation artifacts across CI runs

Options:

  • ccache: Traditional, well-tested, local cache + GitHub Actions cache
  • sccache: Rust-based, supports cloud backends (S3, GCS, GitHub Actions cache)
  • Buildcache: Modern alternative with good Docker support

Investigation Steps:

# Example sccache integration in CI:
- name: Setup sccache
  uses: mozilla-actions/sccache-action@v0.0.4

- name: Configure environment
  run: |
    echo "CC=sccache gcc" >> $GITHUB_ENV
    echo "CXX=sccache g++" >> $GITHUB_ENV

Expected Impact: 30-50% faster C library compilation on cache hit

1.3 Dependency Pre-building Strategy

Concept: Build vcpkg dependencies once, cache, reuse across all wheel builds

Approach A: Separate Dependency Build Job

jobs:
  build-dependencies:
    strategy:
      matrix:
        include:
          - os: ubuntu-latest, triplet: x64-linux-pic
          - os: ubuntu-24.04-arm, triplet: arm64-linux-pic
          # ... etc
    steps:
      - name: Build and cache vcpkg dependencies
        run: vcpkg install --triplet=${{ matrix.triplet }}
      - name: Cache vcpkg_installed
        uses: actions/cache/save@v4
        with:
          path: vcpkg_installed
          key: vcpkg-${{ matrix.triplet }}-${{ hashFiles('vcpkg.json') }}
  
  build-wheels:
    needs: build-dependencies
    steps:
      - name: Restore vcpkg cache
        uses: actions/cache/restore@v4

Approach B: Use GitHub Container Registry for Pre-built Dependencies

  • Build Docker images with vcpkg dependencies pre-installed
  • Push to ghcr.io/LedFx/aubio-builder:x64-linux-pic
  • Use in cibuildwheel Linux builds

1.4 CI Workflow Optimization

Current Issues:

  • 227 lines of YAML with duplication
  • Before-all scripts are repetitive across platforms
  • No job parallelization optimization
  • No conditional job skipping (e.g., skip builds if only docs changed)

Optimization Opportunities:

  1. Use reusable workflows for common setup patterns
  2. Matrix strategy improvements - reduce duplication
  3. Path filters - skip unnecessary builds
  4. Composite actions - extract common steps

Example Path Filter:

on:
  pull_request:
    paths-ignore:
      - 'doc/**'
      - '**.md'
      - '**.rst'

Recommended Solution Strategy

Note: CI/CD is already well-optimized with caching for macOS and Windows. The following focuses on incremental improvements.

Phase 1: Analysis and Measurement (1 day)

  1. Measure actual cache hit rates on macOS/Windows
  2. Profile build times to identify true bottlenecks
  3. Analyze cache size and effectiveness
  4. Determine if further optimization is worthwhile

Expected Impact: Better understanding of actual performance

Phase 2: Linux Docker Optimization (2-3 days, if worthwhile)

  1. Explore Docker BuildKit caching for vcpkg builds
  2. Consider pre-built dependency Docker images
  3. Test bind mounts or volume caching strategies
  4. Measure performance improvements

Expected Impact: 20-30% faster Linux builds (if successful)

Phase 3: Compiler Caching (1-2 days)

  1. Add ccache/sccache for C library compilation
  2. Integrate with GitHub Actions cache
  3. Measure compilation time improvements

Expected Impact: 15-25% faster C compilation on cache hit

Phase 4: Workflow Organization (1 day)

  1. Extract reusable workflows
  2. Optimize path filters
  3. Improve matrix strategy

Expected Impact: Better maintainability

Implementation Guide

Step 1: Analyze Current Cache Performance

Before making changes, measure current state:

# 1. Check recent CI workflow runs for cache hit rates
# Look in GitHub Actions logs for messages like:
# "Cache restored from key: vcpkg-installed-macos-x64-..."

# 2. Measure vcpkg_installed directory sizes
du -sh vcpkg_installed/*/

# 3. Compare build times with/without cache
# Run a workflow with cleared cache vs. warm cache

Why: Understand baseline performance before optimization

Step 2: Add ccache Integration (If Needed)

For Linux builds:

[tool.cibuildwheel.linux.environment]
CC = "ccache /opt/rh/gcc-toolset-14/root/usr/bin/gcc"
CXX = "ccache /opt/rh/gcc-toolset-14/root/usr/bin/g++"
CCACHE_DIR = "/tmp/ccache"

[tool.cibuildwheel.linux]
before-all = """
    yum install -y ccache && \
    # ... existing vcpkg setup
"""

Note: The current caching strategy using actions/cache for vcpkg_installed is already the recommended approach. The deprecated x-gha binary source provider (removed June 2024) has been superseded by direct directory caching. ccache-linux-${{ matrix.arch }}-


#### Step 3: Optimize Linux Docker Builds (Advanced)

**Current Challenge:** Docker containers don't persist GitHub Actions cache

**Potential Solutions:**

**Option A: Docker BuildKit Caching**
```yaml
# In .github/workflows/build.yml
- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3

# Use BuildKit cache mounts in cibuildwheel
# (requires custom Docker image configuration)

Option B: Pre-built Dependency Container

# Create custom manylinux image with vcpkg dependencies pre-installed
FROM quay.io/pypa/manylinux_2_28_x86_64
RUN yum install -y git zip unzip tar curl make nasm
RUN git clone https://github.com/microsoft/vcpkg.git /opt/vcpkg
RUN cd /opt/vcpkg && ./bootstrap-vcpkg.sh
COPY vcpkg.json vcpkg-triplets/ /tmp/aubio-build/
RUN cd /tmp/aubio-build && /opt/vcpkg/vcpkg install --triplet=x64-linux-pic

Trade-off: Maintenance overhead vs. build speed improvement

Step 4: Add Path Filters

on:
  pull_request:
    paths:
      - 'src/**'
      - 'python/**'
      - 'tests/**'
      - 'meson.build'
      - 'meson_options.txt'
      - 'pyproject.toml'
      - 'vcpkg.json'
      - '.github/workflows/build.yml'
  push:
    branches: [main, develop]

Success Metrics

Current Baseline (With Existing Caching):

  • Total CI time: 70-90 minutes (with cache hits on macOS/Windows)
  • macOS vcpkg: ~2-4 minutes (cache hit)
  • Windows vcpkg: ~2-3 minutes (cache hit)
  • Linux vcpkg: ~8-12 minutes (rebuild each time)
  • Cache hit rate: ~70-80% (macOS/Windows)

After Additional Optimization (Target):

  • Total CI time: 50-65 minutes (20-25% improvement)
  • Linux vcpkg: ~4-6 minutes (with Docker caching, if implemented)
  • C library compilation: ~5-7 minutes (with ccache, 30% faster)
  • Cache hit rate: >85% (all platforms where applicable)

Note: The existing caching infrastructure is already quite effective. Further optimizations have diminishing returns and should be evaluated based on actual measured bottlenecks.

References

Risks and Mitigations

Risk 1: Cache size exceeds GitHub's 10GB limit per repo

  • Mitigation: Monitor cache size, implement LRU eviction strategy, use vcpkg's NuGet backend if needed

Risk 2: Cache invalidation issues (stale dependencies)

  • Mitigation: Cache key includes hashFiles('vcpkg.json'), auto-invalidates on dependency changes

Risk 3: Binary cache corruption

  • Mitigation: vcpkg verifies checksums, fallback to source build on failure

Priority 2: Python Code Generation Modernization

Priority Level: HIGH

Estimated ROI: HIGH - Reduces maintenance burden, improves type safety
Effort: 4-6 days
Impact: Python package quality, developer experience

Problem Statement

The Python bindings use a custom code generation system (python/lib/gen_external.py, gen_code.py) that parses src/aubio.h and generates C extension code. While functional, this approach has several issues:

Current System:

  • 352 lines of custom parser in gen_external.py
  • 642 lines of code generation logic in gen_code.py
  • Generates 10+ C files (gen-onset.c, gen-pitch.c, etc.) at build time
  • Manual maintenance of object lists and templates
  • No type hints in generated Python bindings
  • Fragile parsing logic dependent on header file format

Pain Points:

  1. Maintenance burden: Any change to C API requires updating generator
  2. No IDE support: Generated Python code lacks type hints
  3. Build complexity: Code generation adds build-time dependency
  4. Limited extensibility: Hard to add new object types or methods
  5. Python 3.12+ compatibility: No TypedDict, Protocol support
  6. No docstrings: Generated code has minimal documentation

Investigation Required

2.1 Modern Python Binding Alternatives

Option A: Migrate to pybind11

  • Pros:
    • Modern C++11 binding framework
    • Automatic type conversion, docstrings
    • Full Python 3.x support with type hints
    • Excellent NumPy integration
    • Active maintenance and community
  • Cons:
    • Requires C++11 (aubio is C99)
    • Significant migration effort
    • All bindings need rewriting

Option B: Migrate to nanobind

  • Pros:
    • Modern, lightweight (successor to pybind11)
    • Better performance and smaller binaries
    • Excellent type hint support
  • Cons:
    • Newer project (less mature)
    • Similar C++ requirement
    • Migration effort

Option C: Use CFFI

  • Pros:
    • Pure Python, no C++ required
    • Excellent for C libraries
    • Runtime and build-time modes
    • Good NumPy integration
  • Cons:
    • Less ergonomic than pybind11
    • Manual type definitions
    • Performance overhead (mitigated by ABI mode)

Option D: Improve Current Generator

  • Pros:
    • No migration required
    • Incremental improvements
    • Keep current build system
  • Cons:
    • Maintenance burden remains
    • Limited by custom parser approach

2.2 Type Hints and Stub Generation

Current State: No .pyi stub files, no runtime type hints

Options:

  1. Generate stubs with stubgen (mypy tool)
    stubgen -p aubio -o stubs/
  2. Use pybind11-stubgen (if migrating to pybind11)
  3. Manual stub creation for high-value APIs

Benefits:

  • IDE autocomplete and type checking
  • Better documentation
  • mypy/pyright support

2.3 NumPy 2.0 Optimization

Current Issue: aubio uses older NumPy C API

Opportunities:

  • Use NumPy 2.0 C API for better performance
  • Leverage new array protocols
  • Consider using nanobind's NumPy integration (automatic array wrapping)

Recommended Solution Strategy

Phase 1: Add Type Hints (1-2 days)

  1. Generate .pyi stub files for current bindings
  2. Add to package distribution
  3. Test with mypy/pyright

Phase 2: Improve Code Generator (2-3 days)

  1. Add docstring generation from doxygen comments
  2. Improve error messages in generated code
  3. Add type hint generation to .pyi files
  4. Better handling of edge cases

Phase 3: Evaluate Migration Path (1 day)

  1. Create proof-of-concept with pybind11 for 2-3 classes
  2. Measure performance impact
  3. Assess migration effort
  4. Make go/no-go decision

Phase 4: Migration (if approved, 8-12 days)

  1. Set up pybind11 build integration with Meson
  2. Migrate core types (fvec, cvec, etc.)
  3. Migrate processing objects (onset, pitch, tempo, etc.)
  4. Add comprehensive tests
  5. Update documentation

Implementation Guide

Step 1: Generate Type Stubs

Create script: scripts/generate_stubs.py

#!/usr/bin/env python3
"""Generate type stubs for aubio package."""
import subprocess
import sys

def main():
    # Use mypy stubgen to create stubs
    subprocess.run([
        sys.executable, "-m", "mypy.stubgen",
        "-p", "aubio",
        "-o", "python/aubio-stubs"
    ], check=True)
    
    # Post-process stubs to add missing information
    # (e.g., NumPy array types)

if __name__ == "__main__":
    main()

Add to pyproject.toml:

[project]
...
[project.optional-dependencies]
stubs = ["mypy"]

[tool.meson-python]
# Include stub files in wheel

Step 2: Enhance Generator with Docstrings

Modify gen_code.py:

def generate_docstring(self, obj_name: str, method_name: str) -> str:
    """Extract docstring from Doxygen comments in header."""
    # Parse src/aubio.h for /** ... */ comments
    # Convert to Python docstring format
    return f'''"""
    {obj_name}.{method_name}
    
    [Generated from C API documentation]
    """'''

Step 3: pybind11 Proof of Concept

Create: python/ext/pybind11_poc.cpp

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include "aubio.h"

namespace py = pybind11;

PYBIND11_MODULE(_aubio_pybind11, m) {
    // Proof of concept: fvec_t wrapper
    py::class_<fvec_t>(m, "fvec")
        .def(py::init([](size_t length) {
            return new_fvec(length);
        }), "Create new float vector")
        .def_property_readonly("length", 
            [](fvec_t* fv) { return fv->length; })
        .def("__getitem__", 
            [](fvec_t* fv, size_t i) {
                if (i >= fv->length) throw py::index_error();
                return fv->data[i];
            })
        .def("__setitem__",
            [](fvec_t* fv, size_t i, smpl_t val) {
                if (i >= fv->length) throw py::index_error();
                fv->data[i] = val;
            });
    // Add more bindings...
}

Integrate with Meson:

# python/meson.build
if get_option('use_pybind11')
  pybind11_dep = dependency('pybind11')
  py.extension_module('_aubio_pybind11',
    'ext/pybind11_poc.cpp',
    dependencies: [aubio_dep, pybind11_dep, numpy_dep],
    install: true
  )
endif

Success Metrics

Before:

  • No type hints or stubs
  • IDE support: Poor
  • Code generation time: ~5-10 seconds
  • Maintenance effort: HIGH (custom parser)
  • Type safety: None

After (Stubs Only):

  • Full .pyi stubs for all APIs
  • IDE support: Good
  • mypy/pyright compatibility: Yes
  • Maintenance effort: MEDIUM (stubs need updating)

After (pybind11 Migration):

  • Native type hints
  • IDE support: Excellent
  • Code generation time: 0 (pure C++)
  • Maintenance effort: LOW (automatic from C++ declarations)
  • Type safety: Full
  • Performance: Same or better

References


Priority 3: Test Infrastructure Enhancement

Priority Level: HIGH

Estimated ROI: HIGH - Prevents regressions, improves code quality
Effort: 5-7 days
Impact: Code reliability, contributor confidence

Problem Statement

While the project has good test coverage (53 C tests, 31 Python tests), there are significant gaps in testing infrastructure:

Current State:

  • ✅ 45/45 C unit tests passing
  • ✅ Sanitizer testing (ASAN, UBSAN) via GitHub Actions
  • ✅ Python test suite with pytest
  • ❌ No performance/benchmark tests
  • ❌ No fuzz testing (security concern for audio processing)
  • ❌ Limited boundary condition testing
  • ❌ No integration tests for real audio files
  • ❌ Test suite disabled in CI (|| true - tests allowed to fail)

Pain Points:

  1. CI ignores test failures: Tests run but failures don't block PRs
  2. No regression detection: Can't detect performance regressions
  3. Limited edge case coverage: Boundary conditions not systematically tested
  4. No fuzz testing: Audio processing is vulnerable to malformed input
  5. Platform-specific issues: Tests don't cover all platform code paths
  6. Manual testing required: Audio quality verification is manual

Investigation Required

3.1 Test Failure Root Cause Analysis

Current Issue: Tests run with || true in CI, masking failures

Investigation:

# Run tests locally on each platform
meson setup builddir -Dtests=true
meson test -C builddir --print-errorlogs

# Identify which tests fail and why:
# - Missing test data files?
# - Platform-specific issues?
# - Actual bugs?
# - Timing/flakiness?

Questions:

  • Which specific tests are failing?
  • Are failures consistent or flaky?
  • Are they platform-specific?
  • Do we have all required test data files?

3.2 Benchmark Infrastructure Options

Goal: Detect performance regressions in audio processing

Option A: Custom Benchmark Suite

  • Create benchmarks/ directory
  • Benchmark key operations (FFT, onset detection, pitch tracking)
  • Store baseline results, compare on PR

Option B: Google Benchmark

  • Industry-standard C++ benchmarking framework
  • Statistical analysis, outlier detection
  • JSON output for tracking over time

Example:

// benchmarks/bench_fft.c
#include <benchmark/benchmark.h>
#include "aubio.h"

static void BM_FFT_512(benchmark::State& state) {
  aubio_fft_t* fft = new_aubio_fft(512);
  fvec_t* input = new_fvec(512);
  cvec_t* output = new_cvec(512);
  
  for (auto _ : state) {
    aubio_fft_do(fft, input, output);
  }
  
  del_aubio_fft(fft);
  del_fvec(input);
  del_cvec(output);
}
BENCHMARK(BM_FFT_512);

3.3 Fuzz Testing Strategy

Why Fuzz Testing for Audio?

  • Malformed audio files can trigger buffer overflows
  • Unexpected sample rates, bit depths, channel counts
  • Real-world security concern (see CVE-2018-14523, CVE-2018-19800)

Option A: libFuzzer (LLVM)

  • Compile-time instrumentation
  • Fast, efficient
  • Good GitHub Actions integration

Option B: AFL++ (American Fuzzy Lop)

  • Well-tested, widely used
  • Slower but thorough
  • Can find deep bugs

Option C: OSS-Fuzz Integration

  • Google's continuous fuzzing service
  • Free for open source
  • Automatic bug reporting

Harness Example:

// fuzz/fuzz_onset.c
#include <stdint.h>
#include <stddef.h>
#include "aubio.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  if (size < 16) return 0;
  
  // Extract parameters from fuzz input
  uint_t win_s = 512;
  uint_t hop_s = 256;
  
  // Create onset detector
  aubio_onset_t* o = new_aubio_onset("default", win_s, hop_s, 44100);
  if (!o) return 0;
  
  fvec_t* in = new_fvec(hop_s);
  fvec_t* out = new_fvec(1);
  
  // Fill input with fuzz data
  for (uint_t i = 0; i < hop_s && i < size; i++) {
    in->data[i] = (smpl_t)data[i] / 128.0 - 1.0;
  }
  
  // Run onset detection (should not crash)
  aubio_onset_do(o, in, out);
  
  del_aubio_onset(o);
  del_fvec(in);
  del_fvec(out);
  return 0;
}

3.4 Integration Test Suite

Goal: Test real-world audio processing workflows

Test Scenarios:

  1. Load audio file → detect onsets → verify count
  2. Load audio file → extract pitch → verify frequency range
  3. Load audio file → detect tempo → verify BPM
  4. Process audio with different sample rates (8kHz, 44.1kHz, 96kHz)
  5. Handle edge cases (empty files, very short files, very long files)

Test Data:

  • Create minimal test audio files (synthetic)
  • Use known reference files with expected outputs
  • Store in tests/data/ directory

Recommended Solution Strategy

Phase 1: Fix Existing Tests (2-3 days)

  1. Investigate all test failures
  2. Fix root causes (missing data, platform issues, bugs)
  3. Remove || true from CI test commands
  4. Make tests a required check for PR merging

Phase 2: Add Benchmark Infrastructure (1-2 days)

  1. Integrate Google Benchmark
  2. Create benchmarks for critical operations
  3. Add CI job to run benchmarks
  4. Set up performance tracking (store results as artifacts)

Phase 3: Implement Fuzz Testing (2 days)

  1. Set up libFuzzer harnesses
  2. Create fuzz targets for all input parsers
  3. Run locally for 24-48 hours initially
  4. Add lightweight fuzz testing to CI (5 minute runs)

Phase 4: Integration Tests (1-2 days)

  1. Create synthetic test audio files
  2. Write high-level test scenarios
  3. Add to pytest suite
  4. Document expected behaviors

Implementation Guide

Step 1: Enable Test Failures in CI

Modify: .github/workflows/build.yml

# BEFORE:
test-command = "... && pytest {project}/python/tests || true"

# AFTER:
test-command = "... && pytest {project}/python/tests"

# Add separate test job for better visibility:
test-c-library:
  name: Test C library
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Setup and build
      run: |
        pip install meson ninja numpy
        meson setup builddir -Dtests=true
        meson compile -C builddir
    - name: Run tests
      run: meson test -C builddir --print-errorlogs

Step 2: Add Google Benchmark

Add to vcpkg.json:

{
  "dependencies": [
    ...
    {
      "name": "benchmark",
      "platform": "!windows"  // Optional: only for development
    }
  ]
}

Create: benchmarks/meson.build

benchmark_dep = dependency('benchmark', required: false)

if benchmark_dep.found()
  bench_fft = executable('bench_fft',
    'bench_fft.c',
    dependencies: [aubio_dep, benchmark_dep],
  )
  
  benchmark('FFT Performance', bench_fft)
endif

Step 3: Set up libFuzzer

Create: .github/workflows/fuzz.yml

name: Fuzz Testing
on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly
  workflow_dispatch:

jobs:
  fuzz:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build with fuzzing
        run: |
          export CC=clang
          export CFLAGS="-fsanitize=fuzzer,address -g"
          meson setup builddir
          meson compile -C builddir
      
      - name: Run fuzzers (5 minutes each)
        run: |
          for fuzzer in builddir/fuzz/fuzz_*; do
            timeout 300 $fuzzer fuzz/corpus/ || true
          done
      
      - name: Upload crashes
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: fuzz-crashes
          path: crash-*

Create: fuzz/meson.build

if get_option('fuzzing')
  fuzzer_flags = ['-fsanitize=fuzzer,address']
  
  fuzz_onset = executable('fuzz_onset',
    'fuzz_onset.c',
    c_args: fuzzer_flags,
    link_args: fuzzer_flags,
    dependencies: aubio_dep
  )
  
  # Add more fuzz targets...
endif

Step 4: Integration Tests

Create: python/tests/test_integration.py

"""Integration tests with real audio processing workflows."""
import pytest
import aubio
import numpy as np

def generate_sine_wave(freq=440, duration=1.0, sr=44100):
    """Generate synthetic sine wave for testing."""
    t = np.linspace(0, duration, int(sr * duration))
    return np.sin(2 * np.pi * freq * t).astype(np.float32)

def test_onset_detection_workflow():
    """Test complete onset detection workflow."""
    # Generate audio with sharp attack
    signal = generate_sine_wave(440, 1.0)
    signal[:100] *= np.linspace(0, 1, 100)  # Add attack
    
    # Detect onsets
    onset = aubio.onset("default", 512, 256, 44100)
    onsets = []
    
    for frame in signal.reshape(-1, 256):
        fvec = aubio.fvec(256)
        fvec[:] = frame
        if onset(fvec):
            onsets.append(onset.get_last())
    
    # Verify at least one onset detected at start
    assert len(onsets) >= 1
    assert onsets[0] < 1000  # Within first 1000 samples

def test_pitch_detection_workflow():
    """Test complete pitch detection workflow."""
    signal = generate_sine_wave(440, 1.0)
    
    pitch_o = aubio.pitch("default", 2048, 512, 44100)
    pitch_o.set_unit("Hz")
    
    pitches = []
    for frame in signal.reshape(-1, 512):
        fvec = aubio.fvec(512)
        fvec[:] = frame
        detected = pitch_o(fvec)[0]
        if detected > 0:
            pitches.append(detected)
    
    # Verify average detected pitch is close to 440 Hz
    avg_pitch = np.mean(pitches)
    assert 430 < avg_pitch < 450, f"Expected ~440 Hz, got {avg_pitch}"

Success Metrics

Before:

  • C tests: 45/45 passing (but CI ignores failures)
  • Python tests: Run with || true
  • Performance tracking: None
  • Fuzz testing: None
  • Integration tests: None
  • CI enforcement: Weak

After:

  • C tests: All passing, CI blocks on failure
  • Python tests: All passing, CI blocks on failure
  • Performance tracking: Automated benchmarks on every PR
  • Fuzz testing: Continuous fuzzing in CI, OSS-Fuzz integration
  • Integration tests: 10+ real-world scenarios
  • CI enforcement: Strong (tests are required checks)
  • Coverage: >80% (measured with gcov/lcov)

References


Priority 4: Code Quality and Static Analysis

Priority Level: MEDIUM

Estimated ROI: MEDIUM - Prevents bugs, improves maintainability
Effort: 3-4 days
Impact: Code quality, security, maintainability

Problem Statement

The codebase has basic security hardening but lacks comprehensive static analysis and code quality tools:

Current State:

  • ✅ Security compiler flags (-fstack-protector-strong, -D_FORTIFY_SOURCE=2)
  • ✅ CodeQL scanning enabled
  • ✅ Sanitizer testing (ASAN, UBSAN)
  • ❌ No clang-tidy integration
  • ❌ No cppcheck or other static analyzers
  • ❌ No code coverage tracking
  • ❌ No complexity metrics
  • ❌ Limited compiler warning coverage

Pain Points:

  1. No code coverage metrics: Can't track test coverage improvements
  2. Manual code review burden: No automated checks for common patterns
  3. Inconsistent code style: No formatter or style checker
  4. Missing best practices: No linting for security patterns
  5. Technical debt invisible: No complexity or maintainability metrics

Investigation Required

4.1 Static Analysis Tool Selection

Option A: Clang-Tidy

  • Part of LLVM toolchain
  • C/C++ focused
  • Configurable checks
  • Good Meson integration

Checks to Enable:

  • clang-analyzer-* - Core static analysis
  • bugprone-* - Bug-prone patterns
  • cppcoreguidelines-* - C++ Core Guidelines
  • readability-* - Code readability
  • performance-* - Performance issues
  • cert-* - CERT secure coding rules

Option B: Cppcheck

  • Focused on C/C++
  • Zero false-positive goal
  • Lightweight
  • Good for CI

Option C: PVS-Studio

  • Commercial (free for open source)
  • Very thorough
  • Low false-positive rate
  • Requires registration

Recommendation: Start with Clang-Tidy + Cppcheck (both free, complementary)

4.2 Code Coverage Strategy

Tools:

  • gcov/lcov - Traditional, well-supported
  • llvm-cov - Modern, better integration with Clang
  • Codecov.io - Online dashboard, PR comments
  • Coveralls - Alternative to Codecov

Integration Points:

  1. Compile with --coverage flags
  2. Run test suite
  3. Generate coverage reports
  4. Upload to Codecov
  5. Enforce minimum coverage in CI

Target Coverage:

  • Overall: >80%
  • New code: >90%
  • Critical paths (audio I/O, DSP): >95%

4.3 Code Formatting and Style

Options:

  • clang-format: Industry standard for C/C++
  • uncrustify: Highly configurable
  • astyle: Simple, legacy

Current Issue: No consistent style enforcement

Solution:

  1. Add .clang-format configuration
  2. Add pre-commit hook
  3. Add CI check
  4. Run once to reformat codebase (separate PR)

4.4 Complexity Metrics

Metrics to Track:

  • Cyclomatic complexity (functions >15 are suspect)
  • Function length (>100 lines is concerning)
  • Nesting depth (>4 levels is hard to understand)
  • Maintainability index

Tools:

  • lizard - Multi-language code complexity analyzer
  • sloccount - Line count and effort estimation
  • SonarQube - Comprehensive code quality platform

Recommended Solution Strategy

Phase 1: Static Analysis Integration (2 days)

  1. Add clang-tidy configuration
  2. Run on codebase, fix critical issues
  3. Add to CI (start with warnings only)
  4. Gradually increase strictness

Phase 2: Code Coverage (1 day)

  1. Enable coverage compilation in Meson
  2. Integrate with Codecov.io
  3. Add coverage badge to README
  4. Set baseline, track improvements

Phase 3: Code Formatting (1 day)

  1. Add clang-format configuration
  2. Format codebase (one-time)
  3. Add pre-commit hook
  4. Add CI enforcement

Implementation Guide

Step 1: Clang-Tidy Configuration

Create: .clang-tidy

---
Checks: >
  clang-analyzer-*,
  bugprone-*,
  cert-*,
  readability-*,
  performance-*,
  -readability-magic-numbers,
  -cert-err33-c,
  -clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling

WarningsAsErrors: ''  # Start permissive, tighten later

CheckOptions:
  - key: readability-identifier-naming.FunctionCase
    value: lower_case
  - key: readability-identifier-naming.VariableCase
    value: lower_case
  - key: readability-identifier-naming.ConstantCase
    value: UPPER_CASE

Add to Meson:

# meson.build
clang_tidy = find_program('clang-tidy', required: false)
if clang_tidy.found() and get_option('clang_tidy')
  run_target('clang-tidy',
    command: [clang_tidy, '-p', meson.build_root()] + aubio_sources
  )
endif

Add CI Job:

static-analysis:
  name: Static Analysis (clang-tidy)
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install tools
      run: sudo apt-get install -y clang-tidy
    - name: Setup build
      run: meson setup builddir
    - name: Run clang-tidy
      run: |
        ninja -C builddir clang-tidy 2>&1 | tee clang-tidy.log
        # Fail if errors found (not warnings)
        ! grep "error:" clang-tidy.log

Step 2: Code Coverage with Codecov

Modify: meson.build

if get_option('b_coverage')
  add_project_arguments('-fprofile-arcs', '-ftest-coverage', language: 'c')
  add_project_link_arguments('-lgcov', language: 'c')
endif

Add CI Job:

coverage:
  name: Code Coverage
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install dependencies
      run: |
        sudo apt-get install -y lcov
        pip install meson ninja numpy
    - name: Build with coverage
      run: |
        meson setup builddir -Db_coverage=true -Dtests=true
        meson compile -C builddir
    - name: Run tests
      run: meson test -C builddir
    - name: Generate coverage
      run: ninja -C builddir coverage-html
    - name: Upload to Codecov
      uses: codecov/codecov-action@v4
      with:
        files: builddir/meson-logs/coverage.xml
        fail_ci_if_error: true

Add badge to README:

[![Coverage](https://codecov.io/gh/LedFx/aubio-ledfx/branch/main/graph/badge.svg)](https://codecov.io/gh/LedFx/aubio-ledfx)

Step 3: Clang-Format

Create: .clang-format

---
BasedOnStyle: LLVM
IndentWidth: 2
ColumnLimit: 80
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: Never
BreakBeforeBraces: Linux
IndentCaseLabels: false
PointerAlignment: Left
SpaceAfterCStyleCast: true

Add pre-commit hook:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/mirrors-clang-format
    rev: v18.1.0
    hooks:
      - id: clang-format
        types_or: [c, c++]

Add CI check:

format-check:
  name: Code Formatting
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Check formatting
      uses: jidicula/clang-format-action@v4.11.0
      with:
        clang-format-version: '18'
        check-path: 'src'

Success Metrics

Before:

  • Static analysis: CodeQL only
  • Code coverage: Unknown
  • Code formatting: Inconsistent
  • CI checks: Basic
  • Technical debt: Unknown

After:

  • Static analysis: Clang-tidy + Cppcheck + CodeQL
  • Code coverage: >80%, tracked via Codecov
  • Code formatting: Consistent, enforced by CI
  • CI checks: Comprehensive (build, test, lint, format, coverage)
  • Technical debt: Visible and tracked

References


Priority 5: Documentation and Developer Experience

Priority Level: MEDIUM

Estimated ROI: MEDIUM - Improves contributor onboarding
Effort: 3-5 days
Impact: Contributor experience, project sustainability

Problem Statement

The project has good README and basic documentation, but lacks comprehensive developer guides and modern documentation infrastructure:

Current State:

  • ✅ Good README with build instructions
  • ✅ Sphinx documentation for Python API
  • ✅ Doxygen documentation for C API
  • ❌ No contributor guide
  • ❌ No architecture documentation
  • ❌ No debugging guide
  • ❌ Build documentation scattered
  • ❌ No API design rationale

Pain Points:

  1. Onboarding difficulty: New contributors struggle to understand codebase
  2. Scattered documentation: Build, testing, vcpkg info in multiple places
  3. No debugging guide: Hard to troubleshoot build/test issues
  4. API documentation incomplete: Many functions lack detailed docs
  5. No examples: Limited code examples for common tasks

Investigation Required

5.1 Documentation Structure Analysis

Current Documentation:

  • README.md - 215 lines, build instructions
  • doc/ - 40+ RST files for Sphinx
  • python/README.md - Python-specific info
  • Various *.md files - Security, implementation plans

Gaps:

  1. No CONTRIBUTING.md
  2. No ARCHITECTURE.md
  3. No DEBUGGING.md
  4. No RELEASE.md
  5. No examples directory with tutorials

5.2 API Documentation Quality

Sample Analysis of Current State:

// Example from src/pitch/pitch.h
aubio_pitch_t * new_aubio_pitch (const char_t * method,
    uint_t buf_size, uint_t hop_size, uint_t samplerate);

// Has basic comment, but missing:
// - List of available methods
// - Valid ranges for parameters
// - Return value details (NULL on error?)
// - Example usage
// - Performance characteristics

Improvement Areas:

  • Add parameter validation documentation
  • Document error conditions
  • Add usage examples
  • Cross-reference related functions

5.3 Interactive Examples

Goal: Lower barrier to entry with runnable examples

Options:

  1. Jupyter notebooks - Interactive Python examples
  2. Example programs - C examples in examples/
  3. Online playground - Web-based demo (ambitious)

Priority Examples to Create:

  • Basic onset detection
  • Pitch tracking
  • Tempo detection
  • Audio file processing workflow
  • Real-time audio processing

Recommended Solution Strategy

Phase 1: Essential Contributor Docs (2 days)

  1. Create CONTRIBUTING.md
  2. Create ARCHITECTURE.md
  3. Update README with better quick start
  4. Add DEBUGGING.md for troubleshooting

Phase 2: API Documentation Enhancement (1-2 days)

  1. Audit all public APIs
  2. Add missing parameter documentation
  3. Add error condition documentation
  4. Create API reference guide

Phase 3: Examples and Tutorials (1-2 days)

  1. Create Jupyter notebooks for Python
  2. Expand C examples
  3. Add to documentation
  4. Test examples in CI

Implementation Guide

Step 1: Create CONTRIBUTING.md

Create: CONTRIBUTING.md

# Contributing to aubio-ledfx

Thank you for your interest in contributing to aubio-ledfx!

## Quick Start

1. **Fork and clone:**
   ```bash
   git clone https://github.com/YOUR_USERNAME/aubio-ledfx.git
   cd aubio-ledfx
  1. Set up development environment:

    # Install dependencies
    pip install meson ninja numpy pytest
    
    # Configure and build
    meson setup builddir -Dtests=true -Dexamples=true
    meson compile -C builddir
    
    # Run tests
    meson test -C builddir
  2. Make changes and test:

    # Edit code
    vim src/...
    
    # Rebuild
    meson compile -C builddir
    
    # Test
    meson test -C builddir
    
    # Run specific test
    ./builddir/tests/test-onset
  3. Submit PR:

    • Create feature branch
    • Make atomic commits
    • Write tests
    • Update documentation
    • Submit PR with description

Code Style

  • C code: Follow existing style (clang-format enforced)
  • Python code: PEP 8 (black formatter)
  • Commit messages: Conventional Commits format

Testing Requirements

  • All new code must have tests
  • C tests in tests/src/
  • Python tests in python/tests/
  • Run sanitizers: meson setup builddir -Db_sanitize=address,undefined

Documentation

  • Update relevant .rst files in doc/
  • Add docstrings to Python code
  • Add Doxygen comments to C functions

Review Process

  1. Automated checks must pass (CI, tests, linting)
  2. Code review by maintainer
  3. Merge when approved

Getting Help

  • Open an issue for questions
  • Check existing documentation in doc/
  • See DEBUGGING.md for troubleshooting

#### Step 2: Create ARCHITECTURE.md

**Create:** `ARCHITECTURE.md`
```markdown
# aubio-ledfx Architecture

## Overview

aubio-ledfx is a C library with Python bindings for audio analysis.

## Project Structure

aubio-ledfx/ ├── src/ # C library source │ ├── aubio.h # Main public API header │ ├── aubio_priv.h # Private/internal header │ ├── mathutils.c # Math utilities │ ├── fvec.c # Float vector operations │ ├── cvec.c # Complex vector operations │ ├── spectral/ # Spectral analysis │ ├── pitch/ # Pitch detection algorithms │ ├── tempo/ # Tempo and beat tracking │ ├── onset/ # Onset detection │ └── io/ # Audio I/O (file, device) ├── python/ # Python bindings │ ├── ext/ # C extension module │ ├── lib/ # Pure Python code │ └── tests/ # Python test suite ├── tests/ # C test suite ├── examples/ # Example programs └── doc/ # Documentation (Sphinx + Doxygen)


## Core Concepts

### Data Types

- **fvec_t:** Float vector (real-valued signals)
- **cvec_t:** Complex vector (frequency domain)
- **fmat_t:** Float matrix (multi-channel)

### Processing Pipeline

Audio File → Source → fvec_t → Analysis → Results ↓ FFT/PVOC → cvec_t → Spectral Analysis


### Object-Oriented C Pattern

```c
// Creation
aubio_onset_t* o = new_aubio_onset("default", 512, 256, 44100);

// Processing
aubio_onset_do(o, input_fvec, output_fvec);

// Destruction
del_aubio_onset(o);

Build System

  • Meson: Build configuration
  • vcpkg: Dependency management
  • meson-python: Python package build

See doc/meson_reference.rst for details.

FFT Backends (Priority Order)

  1. fftw3f (recommended)
  2. Accelerate (macOS)
  3. Intel IPP (optional)
  4. ooura (fallback, always available)

Thread Safety

aubio is NOT thread-safe. Each thread must have its own objects.

Memory Management

  • All new_* functions allocate, return pointer or NULL on failure
  • All del_* functions free, safe to call with NULL
  • No garbage collection, manual management required

#### Step 3: API Documentation Template

**Create documentation script:**
```python
#!/usr/bin/env python3
"""Audit API documentation completeness."""

import re
from pathlib import Path

def check_function_doc(filepath, function_name, comment_block):
    """Check if function has complete documentation."""
    issues = []
    
    # Check for parameter documentation
    if '@param' not in comment_block:
        issues.append("Missing @param documentation")
    
    # Check for return documentation
    if '@return' not in comment_block:
        issues.append("Missing @return documentation")
    
    # Check for example
    if '@example' not in comment_block and 'new_' in function_name:
        issues.append("Constructor missing usage example")
    
    return issues

# Run on all headers...

Step 4: Jupyter Notebook Examples

Create: examples/notebooks/01_onset_detection.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Onset Detection with aubio\n",
    "\n",
    "This notebook demonstrates onset detection using aubio-ledfx."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import aubio\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Generate test signal\n",
    "samplerate = 44100\n",
    "duration = 1.0\n",
    "\n",
    "# Create onset detector\n",
    "win_s = 512\n",
    "hop_s = 256\n",
    "onset = aubio.onset(\"default\", win_s, hop_s, samplerate)\n",
    "\n",
    "# Process audio...\n",
    "# (full example)"
   ]
  }
 ]
}

Success Metrics

Before:

  • Contributor onboarding: 2-3 days
  • Documentation coverage: 40%
  • Code examples: Minimal
  • API reference: Incomplete

After:

  • Contributor onboarding: <4 hours
  • Documentation coverage: >80%
  • Code examples: 10+ working examples
  • API reference: Complete with examples
  • Jupyter notebooks: 5+ tutorials

Implementation Roadmap

Month 1: Quick Wins

  • Week 1: CI/CD optimization (Priority 1, Phase 1)
  • Week 2: Test infrastructure fixes (Priority 3, Phase 1)
  • Week 3: Static analysis integration (Priority 4, Phase 1)
  • Week 4: Documentation essentials (Priority 5, Phase 1)

Month 2: Deep Work

  • Week 5-6: CI/CD dependency optimization (Priority 1, Phase 2)
  • Week 7: Test benchmarking and fuzzing (Priority 3, Phases 2-3)
  • Week 8: Python bindings analysis (Priority 2, Phases 1-3)

Month 3: Polish and Iterate

  • Week 9-10: Python binding improvements or migration (Priority 2, Phase 4)
  • Week 11: Code coverage and quality metrics (Priority 4, Phases 2-3)
  • Week 12: Documentation and examples (Priority 5, Phases 2-3)

Ongoing

  • Monitor CI performance metrics
  • Track code coverage trends
  • Review static analysis findings
  • Update documentation as code evolves

Success Criteria

Key Performance Indicators (KPIs)

Metric Baseline Target Timeframe
CI build time 90-120 min 35-50 min Month 1
Test pass rate Unknown (ignored) 100% Month 1
Code coverage Unknown >80% Month 2
Static analysis issues Unknown <10 high Month 1
Documentation coverage ~40% >80% Month 3
Contributor onboarding 2-3 days <4 hours Month 3
Fuzz testing None Continuous Month 2
Performance benchmarks None Tracked Month 2

Quality Gates

Before merging PRs:

  • ✅ All tests pass
  • ✅ Code coverage ≥ baseline (no regression)
  • ✅ Static analysis passes
  • ✅ Code formatted with clang-format
  • ✅ Documentation updated

Risk Assessment

High Risk Items

  1. Python binding migration (if pursued)

    • Risk: Breaking changes for users
    • Mitigation: Gradual migration, maintain compatibility layer
  2. Test failures in CI

    • Risk: Unknown failures when removing || true
    • Mitigation: Fix all tests locally first
  3. Performance regression from optimization

    • Risk: CI optimizations slow down actual builds
    • Mitigation: Benchmark at each step

Medium Risk Items

  1. vcpkg cache size

    • Risk: Exceeding GitHub's 10GB cache limit
    • Mitigation: Monitor, implement LRU eviction
  2. Fuzz testing resource usage

    • Risk: Fuzzing consumes too much CI time/resources
    • Mitigation: Run on schedule, not every PR

Conclusion

These 5 priorities represent the highest-value optimizations for aubio-ledfx:

  1. CI/CD Performance - Immediate developer productivity impact
  2. Python Modernization - Long-term maintainability and user experience
  3. Test Infrastructure - Code quality and confidence
  4. Static Analysis - Proactive bug prevention
  5. Documentation - Contributor growth and project sustainability

Next Steps:

  1. Review this roadmap with maintainers
  2. Prioritize based on team capacity
  3. Create GitHub issues for each priority
  4. Begin implementation in priority order
  5. Track progress with KPIs

Estimated Total Effort: 18-27 days (spread over 3 months) Estimated ROI: 3-5x in reduced maintenance burden and faster iteration


Document Maintenance:

  • Review quarterly
  • Update based on progress
  • Add new priorities as they emerge
  • Archive completed items

Last Updated: 2025-11-14