Document Version: 1.0
Created: 2025-11-14
Project: aubio-ledfx maintained fork
Purpose: Comprehensive optimization and modernization strategy
This document presents the top 5 highest priority optimization and modernization work items for aubio-ledfx, a maintained fork providing Python 3.8-3.13 support with pre-built wheels. The project has successfully migrated from waf to Meson build system and integrated vcpkg for cross-platform dependency management.
Current State:
- ✅ Meson build system with vcpkg dependencies
- ✅ CI/CD with cibuildwheel for multi-platform wheels (Linux x64/ARM64, macOS Intel/Apple Silicon, Windows AMD64)
- ✅ Security hardening implemented (4 critical vulnerabilities fixed)
- ✅ Sanitizer testing infrastructure (AddressSanitizer + UndefinedBehaviorSanitizer)
- ✅ All 45 C tests passing
Key Metrics:
- 68 C source files, 56 header files (~17K lines)
- 80 Python files including 31 test files
- ~1,000 lines of Meson build configuration
- 5 Python versions supported (3.10-3.14)
- 5 platform/architecture combinations in CI
Estimated ROI: HIGH - Reduces developer iteration time and CI costs by 40-60%
Effort: 3-5 days
Impact: All contributors, every PR, every release
The current CI/CD pipeline using cibuildwheel builds wheels for 5 platform/architecture combinations, with each build taking 15-25 minutes.
Current State - Already Optimized:
- ✅ macOS and Windows use
actions/cache@v4to cachevcpkg_installed/directory - ✅ Caching keys based on
vcpkg.jsonand triplet files (smart invalidation) - ✅
before-allruns once per job (not per Python version), dependencies built once - ✅ Efficient matrix strategy for parallel builds
Current Build Times (estimated):
- Linux x64: ~18 minutes (vcpkg: ~8 min on cache miss, wheel build: ~10 min)
- Linux ARM64: ~22 minutes (vcpkg: ~12 min on cache miss, wheel build: ~10 min)
- macOS x64: ~20 minutes (vcpkg: ~3-4 min on cache hit, wheel build: ~10 min)
- macOS ARM64: ~18 minutes (vcpkg: ~2-3 min on cache hit, wheel build: ~10 min)
- Windows AMD64: ~15 minutes (vcpkg: ~2-3 min on cache hit, wheel build: ~8 min)
Total CI time per PR: ~70-90 minutes for all platforms (with cache hits)
Remaining Pain Points:
- Linux builds: vcpkg dependencies rebuild in manylinux Docker containers (no persistent cache across runs)
- Limited opportunities: macOS/Windows already well-optimized with caching
- Potential improvements: ccache/sccache for C/C++ compilation, workflow organization
Current State (Already Implemented):
- ✅ macOS builds:
actions/cache@v4cachesvcpkg_installed/directory - ✅ Windows builds:
actions/cache@v4cachesvcpkg_installed/directory - ✅ Cache keys use
hashFiles('vcpkg.json', 'vcpkg-triplets/*.cmake')for smart invalidation - ✅ Linux builds: Dependencies rebuild in Docker (GitHub Actions cache doesn't persist in containers)
Note on vcpkg Binary Caching:
The old x-gha binary source provider was deprecated in June 2024. The current implementation uses direct actions/cache for the vcpkg_installed directory, which is the recommended approach for GitHub Actions.
Remaining Optimization Opportunities:
- Linux Docker caching: Explore Docker layer caching or bind mounts to persist vcpkg builds
- Cache analysis: Measure actual cache hit rates on macOS/Windows
- Alternative approaches:
- Pre-built dependency Docker images for Linux
- vcpkg's newer binary caching features (files, nuget providers)
Questions to Answer:
- What's the actual cache hit rate on macOS/Windows in production?
- Can we use Docker BuildKit caching for Linux builds?
- Would pre-built dependency containers be worth the maintenance overhead?
- What's the cache size and is it within GitHub's 10GB limit?
Investigation Steps:
# 1. Measure vcpkg build artifacts size
du -sh vcpkg_installed/x64-osx/
du -sh vcpkg_installed/arm64-osx/
du -sh vcpkg_installed/x64-windows-release/
# 2. Check cache hit rates in CI logs
# Look for "Cache restored from key:" messages in recent workflow runs
# 3. Test Docker BuildKit caching (Linux)
# Add --cache-from and --cache-to flags to docker buildGoal: Cache C/C++ compilation artifacts across CI runs
Options:
- ccache: Traditional, well-tested, local cache + GitHub Actions cache
- sccache: Rust-based, supports cloud backends (S3, GCS, GitHub Actions cache)
- Buildcache: Modern alternative with good Docker support
Investigation Steps:
# Example sccache integration in CI:
- name: Setup sccache
uses: mozilla-actions/sccache-action@v0.0.4
- name: Configure environment
run: |
echo "CC=sccache gcc" >> $GITHUB_ENV
echo "CXX=sccache g++" >> $GITHUB_ENVExpected Impact: 30-50% faster C library compilation on cache hit
Concept: Build vcpkg dependencies once, cache, reuse across all wheel builds
Approach A: Separate Dependency Build Job
jobs:
build-dependencies:
strategy:
matrix:
include:
- os: ubuntu-latest, triplet: x64-linux-pic
- os: ubuntu-24.04-arm, triplet: arm64-linux-pic
# ... etc
steps:
- name: Build and cache vcpkg dependencies
run: vcpkg install --triplet=${{ matrix.triplet }}
- name: Cache vcpkg_installed
uses: actions/cache/save@v4
with:
path: vcpkg_installed
key: vcpkg-${{ matrix.triplet }}-${{ hashFiles('vcpkg.json') }}
build-wheels:
needs: build-dependencies
steps:
- name: Restore vcpkg cache
uses: actions/cache/restore@v4Approach B: Use GitHub Container Registry for Pre-built Dependencies
- Build Docker images with vcpkg dependencies pre-installed
- Push to ghcr.io/LedFx/aubio-builder:x64-linux-pic
- Use in cibuildwheel Linux builds
Current Issues:
- 227 lines of YAML with duplication
- Before-all scripts are repetitive across platforms
- No job parallelization optimization
- No conditional job skipping (e.g., skip builds if only docs changed)
Optimization Opportunities:
- Use reusable workflows for common setup patterns
- Matrix strategy improvements - reduce duplication
- Path filters - skip unnecessary builds
- Composite actions - extract common steps
Example Path Filter:
on:
pull_request:
paths-ignore:
- 'doc/**'
- '**.md'
- '**.rst'Note: CI/CD is already well-optimized with caching for macOS and Windows. The following focuses on incremental improvements.
Phase 1: Analysis and Measurement (1 day)
- Measure actual cache hit rates on macOS/Windows
- Profile build times to identify true bottlenecks
- Analyze cache size and effectiveness
- Determine if further optimization is worthwhile
Expected Impact: Better understanding of actual performance
Phase 2: Linux Docker Optimization (2-3 days, if worthwhile)
- Explore Docker BuildKit caching for vcpkg builds
- Consider pre-built dependency Docker images
- Test bind mounts or volume caching strategies
- Measure performance improvements
Expected Impact: 20-30% faster Linux builds (if successful)
Phase 3: Compiler Caching (1-2 days)
- Add ccache/sccache for C library compilation
- Integrate with GitHub Actions cache
- Measure compilation time improvements
Expected Impact: 15-25% faster C compilation on cache hit
Phase 4: Workflow Organization (1 day)
- Extract reusable workflows
- Optimize path filters
- Improve matrix strategy
Expected Impact: Better maintainability
Before making changes, measure current state:
# 1. Check recent CI workflow runs for cache hit rates
# Look in GitHub Actions logs for messages like:
# "Cache restored from key: vcpkg-installed-macos-x64-..."
# 2. Measure vcpkg_installed directory sizes
du -sh vcpkg_installed/*/
# 3. Compare build times with/without cache
# Run a workflow with cleared cache vs. warm cacheWhy: Understand baseline performance before optimization
For Linux builds:
[tool.cibuildwheel.linux.environment]
CC = "ccache /opt/rh/gcc-toolset-14/root/usr/bin/gcc"
CXX = "ccache /opt/rh/gcc-toolset-14/root/usr/bin/g++"
CCACHE_DIR = "/tmp/ccache"
[tool.cibuildwheel.linux]
before-all = """
yum install -y ccache && \
# ... existing vcpkg setup
"""Note: The current caching strategy using actions/cache for vcpkg_installed is already the recommended approach. The deprecated x-gha binary source provider (removed June 2024) has been superseded by direct directory caching.
ccache-linux-${{ matrix.arch }}-
#### Step 3: Optimize Linux Docker Builds (Advanced)
**Current Challenge:** Docker containers don't persist GitHub Actions cache
**Potential Solutions:**
**Option A: Docker BuildKit Caching**
```yaml
# In .github/workflows/build.yml
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
# Use BuildKit cache mounts in cibuildwheel
# (requires custom Docker image configuration)
Option B: Pre-built Dependency Container
# Create custom manylinux image with vcpkg dependencies pre-installed
FROM quay.io/pypa/manylinux_2_28_x86_64
RUN yum install -y git zip unzip tar curl make nasm
RUN git clone https://github.com/microsoft/vcpkg.git /opt/vcpkg
RUN cd /opt/vcpkg && ./bootstrap-vcpkg.sh
COPY vcpkg.json vcpkg-triplets/ /tmp/aubio-build/
RUN cd /tmp/aubio-build && /opt/vcpkg/vcpkg install --triplet=x64-linux-picTrade-off: Maintenance overhead vs. build speed improvement
on:
pull_request:
paths:
- 'src/**'
- 'python/**'
- 'tests/**'
- 'meson.build'
- 'meson_options.txt'
- 'pyproject.toml'
- 'vcpkg.json'
- '.github/workflows/build.yml'
push:
branches: [main, develop]Current Baseline (With Existing Caching):
- Total CI time: 70-90 minutes (with cache hits on macOS/Windows)
- macOS vcpkg: ~2-4 minutes (cache hit)
- Windows vcpkg: ~2-3 minutes (cache hit)
- Linux vcpkg: ~8-12 minutes (rebuild each time)
- Cache hit rate: ~70-80% (macOS/Windows)
After Additional Optimization (Target):
- Total CI time: 50-65 minutes (20-25% improvement)
- Linux vcpkg: ~4-6 minutes (with Docker caching, if implemented)
- C library compilation: ~5-7 minutes (with ccache, 30% faster)
- Cache hit rate: >85% (all platforms where applicable)
Note: The existing caching infrastructure is already quite effective. Further optimizations have diminishing returns and should be evaluated based on actual measured bottlenecks.
- vcpkg Binary Caching: https://vcpkg.io/en/docs/users/binarycaching.html
- GitHub Actions Cache: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
- cibuildwheel caching: https://cibuildwheel.readthedocs.io/en/stable/faq/#caching
- sccache: https://github.com/mozilla/sccache
Risk 1: Cache size exceeds GitHub's 10GB limit per repo
- Mitigation: Monitor cache size, implement LRU eviction strategy, use vcpkg's NuGet backend if needed
Risk 2: Cache invalidation issues (stale dependencies)
- Mitigation: Cache key includes
hashFiles('vcpkg.json'), auto-invalidates on dependency changes
Risk 3: Binary cache corruption
- Mitigation: vcpkg verifies checksums, fallback to source build on failure
Estimated ROI: HIGH - Reduces maintenance burden, improves type safety
Effort: 4-6 days
Impact: Python package quality, developer experience
The Python bindings use a custom code generation system (python/lib/gen_external.py, gen_code.py) that parses src/aubio.h and generates C extension code. While functional, this approach has several issues:
Current System:
- 352 lines of custom parser in
gen_external.py - 642 lines of code generation logic in
gen_code.py - Generates 10+ C files (
gen-onset.c,gen-pitch.c, etc.) at build time - Manual maintenance of object lists and templates
- No type hints in generated Python bindings
- Fragile parsing logic dependent on header file format
Pain Points:
- Maintenance burden: Any change to C API requires updating generator
- No IDE support: Generated Python code lacks type hints
- Build complexity: Code generation adds build-time dependency
- Limited extensibility: Hard to add new object types or methods
- Python 3.12+ compatibility: No TypedDict, Protocol support
- No docstrings: Generated code has minimal documentation
Option A: Migrate to pybind11
- Pros:
- Modern C++11 binding framework
- Automatic type conversion, docstrings
- Full Python 3.x support with type hints
- Excellent NumPy integration
- Active maintenance and community
- Cons:
- Requires C++11 (aubio is C99)
- Significant migration effort
- All bindings need rewriting
Option B: Migrate to nanobind
- Pros:
- Modern, lightweight (successor to pybind11)
- Better performance and smaller binaries
- Excellent type hint support
- Cons:
- Newer project (less mature)
- Similar C++ requirement
- Migration effort
Option C: Use CFFI
- Pros:
- Pure Python, no C++ required
- Excellent for C libraries
- Runtime and build-time modes
- Good NumPy integration
- Cons:
- Less ergonomic than pybind11
- Manual type definitions
- Performance overhead (mitigated by ABI mode)
Option D: Improve Current Generator
- Pros:
- No migration required
- Incremental improvements
- Keep current build system
- Cons:
- Maintenance burden remains
- Limited by custom parser approach
Current State: No .pyi stub files, no runtime type hints
Options:
- Generate stubs with stubgen (mypy tool)
stubgen -p aubio -o stubs/
- Use pybind11-stubgen (if migrating to pybind11)
- Manual stub creation for high-value APIs
Benefits:
- IDE autocomplete and type checking
- Better documentation
- mypy/pyright support
Current Issue: aubio uses older NumPy C API
Opportunities:
- Use NumPy 2.0 C API for better performance
- Leverage new array protocols
- Consider using nanobind's NumPy integration (automatic array wrapping)
Phase 1: Add Type Hints (1-2 days)
- Generate
.pyistub files for current bindings - Add to package distribution
- Test with mypy/pyright
Phase 2: Improve Code Generator (2-3 days)
- Add docstring generation from doxygen comments
- Improve error messages in generated code
- Add type hint generation to .pyi files
- Better handling of edge cases
Phase 3: Evaluate Migration Path (1 day)
- Create proof-of-concept with pybind11 for 2-3 classes
- Measure performance impact
- Assess migration effort
- Make go/no-go decision
Phase 4: Migration (if approved, 8-12 days)
- Set up pybind11 build integration with Meson
- Migrate core types (fvec, cvec, etc.)
- Migrate processing objects (onset, pitch, tempo, etc.)
- Add comprehensive tests
- Update documentation
Create script: scripts/generate_stubs.py
#!/usr/bin/env python3
"""Generate type stubs for aubio package."""
import subprocess
import sys
def main():
# Use mypy stubgen to create stubs
subprocess.run([
sys.executable, "-m", "mypy.stubgen",
"-p", "aubio",
"-o", "python/aubio-stubs"
], check=True)
# Post-process stubs to add missing information
# (e.g., NumPy array types)
if __name__ == "__main__":
main()Add to pyproject.toml:
[project]
...
[project.optional-dependencies]
stubs = ["mypy"]
[tool.meson-python]
# Include stub files in wheelModify gen_code.py:
def generate_docstring(self, obj_name: str, method_name: str) -> str:
"""Extract docstring from Doxygen comments in header."""
# Parse src/aubio.h for /** ... */ comments
# Convert to Python docstring format
return f'''"""
{obj_name}.{method_name}
[Generated from C API documentation]
"""'''Create: python/ext/pybind11_poc.cpp
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include "aubio.h"
namespace py = pybind11;
PYBIND11_MODULE(_aubio_pybind11, m) {
// Proof of concept: fvec_t wrapper
py::class_<fvec_t>(m, "fvec")
.def(py::init([](size_t length) {
return new_fvec(length);
}), "Create new float vector")
.def_property_readonly("length",
[](fvec_t* fv) { return fv->length; })
.def("__getitem__",
[](fvec_t* fv, size_t i) {
if (i >= fv->length) throw py::index_error();
return fv->data[i];
})
.def("__setitem__",
[](fvec_t* fv, size_t i, smpl_t val) {
if (i >= fv->length) throw py::index_error();
fv->data[i] = val;
});
// Add more bindings...
}Integrate with Meson:
# python/meson.build
if get_option('use_pybind11')
pybind11_dep = dependency('pybind11')
py.extension_module('_aubio_pybind11',
'ext/pybind11_poc.cpp',
dependencies: [aubio_dep, pybind11_dep, numpy_dep],
install: true
)
endifBefore:
- No type hints or stubs
- IDE support: Poor
- Code generation time: ~5-10 seconds
- Maintenance effort: HIGH (custom parser)
- Type safety: None
After (Stubs Only):
- Full .pyi stubs for all APIs
- IDE support: Good
- mypy/pyright compatibility: Yes
- Maintenance effort: MEDIUM (stubs need updating)
After (pybind11 Migration):
- Native type hints
- IDE support: Excellent
- Code generation time: 0 (pure C++)
- Maintenance effort: LOW (automatic from C++ declarations)
- Type safety: Full
- Performance: Same or better
- pybind11: https://pybind11.readthedocs.io/
- nanobind: https://nanobind.readthedocs.io/
- CFFI: https://cffi.readthedocs.io/
- NumPy 2.0 migration: https://numpy.org/devdocs/numpy_2_0_migration_guide.html
- PEP 561 (Stub files): https://peps.python.org/pep-0561/
Estimated ROI: HIGH - Prevents regressions, improves code quality
Effort: 5-7 days
Impact: Code reliability, contributor confidence
While the project has good test coverage (53 C tests, 31 Python tests), there are significant gaps in testing infrastructure:
Current State:
- ✅ 45/45 C unit tests passing
- ✅ Sanitizer testing (ASAN, UBSAN) via GitHub Actions
- ✅ Python test suite with pytest
- ❌ No performance/benchmark tests
- ❌ No fuzz testing (security concern for audio processing)
- ❌ Limited boundary condition testing
- ❌ No integration tests for real audio files
- ❌ Test suite disabled in CI (
|| true- tests allowed to fail)
Pain Points:
- CI ignores test failures: Tests run but failures don't block PRs
- No regression detection: Can't detect performance regressions
- Limited edge case coverage: Boundary conditions not systematically tested
- No fuzz testing: Audio processing is vulnerable to malformed input
- Platform-specific issues: Tests don't cover all platform code paths
- Manual testing required: Audio quality verification is manual
Current Issue: Tests run with || true in CI, masking failures
Investigation:
# Run tests locally on each platform
meson setup builddir -Dtests=true
meson test -C builddir --print-errorlogs
# Identify which tests fail and why:
# - Missing test data files?
# - Platform-specific issues?
# - Actual bugs?
# - Timing/flakiness?Questions:
- Which specific tests are failing?
- Are failures consistent or flaky?
- Are they platform-specific?
- Do we have all required test data files?
Goal: Detect performance regressions in audio processing
Option A: Custom Benchmark Suite
- Create
benchmarks/directory - Benchmark key operations (FFT, onset detection, pitch tracking)
- Store baseline results, compare on PR
Option B: Google Benchmark
- Industry-standard C++ benchmarking framework
- Statistical analysis, outlier detection
- JSON output for tracking over time
Example:
// benchmarks/bench_fft.c
#include <benchmark/benchmark.h>
#include "aubio.h"
static void BM_FFT_512(benchmark::State& state) {
aubio_fft_t* fft = new_aubio_fft(512);
fvec_t* input = new_fvec(512);
cvec_t* output = new_cvec(512);
for (auto _ : state) {
aubio_fft_do(fft, input, output);
}
del_aubio_fft(fft);
del_fvec(input);
del_cvec(output);
}
BENCHMARK(BM_FFT_512);Why Fuzz Testing for Audio?
- Malformed audio files can trigger buffer overflows
- Unexpected sample rates, bit depths, channel counts
- Real-world security concern (see CVE-2018-14523, CVE-2018-19800)
Option A: libFuzzer (LLVM)
- Compile-time instrumentation
- Fast, efficient
- Good GitHub Actions integration
Option B: AFL++ (American Fuzzy Lop)
- Well-tested, widely used
- Slower but thorough
- Can find deep bugs
Option C: OSS-Fuzz Integration
- Google's continuous fuzzing service
- Free for open source
- Automatic bug reporting
Harness Example:
// fuzz/fuzz_onset.c
#include <stdint.h>
#include <stddef.h>
#include "aubio.h"
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < 16) return 0;
// Extract parameters from fuzz input
uint_t win_s = 512;
uint_t hop_s = 256;
// Create onset detector
aubio_onset_t* o = new_aubio_onset("default", win_s, hop_s, 44100);
if (!o) return 0;
fvec_t* in = new_fvec(hop_s);
fvec_t* out = new_fvec(1);
// Fill input with fuzz data
for (uint_t i = 0; i < hop_s && i < size; i++) {
in->data[i] = (smpl_t)data[i] / 128.0 - 1.0;
}
// Run onset detection (should not crash)
aubio_onset_do(o, in, out);
del_aubio_onset(o);
del_fvec(in);
del_fvec(out);
return 0;
}Goal: Test real-world audio processing workflows
Test Scenarios:
- Load audio file → detect onsets → verify count
- Load audio file → extract pitch → verify frequency range
- Load audio file → detect tempo → verify BPM
- Process audio with different sample rates (8kHz, 44.1kHz, 96kHz)
- Handle edge cases (empty files, very short files, very long files)
Test Data:
- Create minimal test audio files (synthetic)
- Use known reference files with expected outputs
- Store in
tests/data/directory
Phase 1: Fix Existing Tests (2-3 days)
- Investigate all test failures
- Fix root causes (missing data, platform issues, bugs)
- Remove
|| truefrom CI test commands - Make tests a required check for PR merging
Phase 2: Add Benchmark Infrastructure (1-2 days)
- Integrate Google Benchmark
- Create benchmarks for critical operations
- Add CI job to run benchmarks
- Set up performance tracking (store results as artifacts)
Phase 3: Implement Fuzz Testing (2 days)
- Set up libFuzzer harnesses
- Create fuzz targets for all input parsers
- Run locally for 24-48 hours initially
- Add lightweight fuzz testing to CI (5 minute runs)
Phase 4: Integration Tests (1-2 days)
- Create synthetic test audio files
- Write high-level test scenarios
- Add to pytest suite
- Document expected behaviors
Modify: .github/workflows/build.yml
# BEFORE:
test-command = "... && pytest {project}/python/tests || true"
# AFTER:
test-command = "... && pytest {project}/python/tests"
# Add separate test job for better visibility:
test-c-library:
name: Test C library
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup and build
run: |
pip install meson ninja numpy
meson setup builddir -Dtests=true
meson compile -C builddir
- name: Run tests
run: meson test -C builddir --print-errorlogsAdd to vcpkg.json:
{
"dependencies": [
...
{
"name": "benchmark",
"platform": "!windows" // Optional: only for development
}
]
}Create: benchmarks/meson.build
benchmark_dep = dependency('benchmark', required: false)
if benchmark_dep.found()
bench_fft = executable('bench_fft',
'bench_fft.c',
dependencies: [aubio_dep, benchmark_dep],
)
benchmark('FFT Performance', bench_fft)
endifCreate: .github/workflows/fuzz.yml
name: Fuzz Testing
on:
schedule:
- cron: '0 0 * * 0' # Weekly
workflow_dispatch:
jobs:
fuzz:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build with fuzzing
run: |
export CC=clang
export CFLAGS="-fsanitize=fuzzer,address -g"
meson setup builddir
meson compile -C builddir
- name: Run fuzzers (5 minutes each)
run: |
for fuzzer in builddir/fuzz/fuzz_*; do
timeout 300 $fuzzer fuzz/corpus/ || true
done
- name: Upload crashes
if: failure()
uses: actions/upload-artifact@v4
with:
name: fuzz-crashes
path: crash-*Create: fuzz/meson.build
if get_option('fuzzing')
fuzzer_flags = ['-fsanitize=fuzzer,address']
fuzz_onset = executable('fuzz_onset',
'fuzz_onset.c',
c_args: fuzzer_flags,
link_args: fuzzer_flags,
dependencies: aubio_dep
)
# Add more fuzz targets...
endifCreate: python/tests/test_integration.py
"""Integration tests with real audio processing workflows."""
import pytest
import aubio
import numpy as np
def generate_sine_wave(freq=440, duration=1.0, sr=44100):
"""Generate synthetic sine wave for testing."""
t = np.linspace(0, duration, int(sr * duration))
return np.sin(2 * np.pi * freq * t).astype(np.float32)
def test_onset_detection_workflow():
"""Test complete onset detection workflow."""
# Generate audio with sharp attack
signal = generate_sine_wave(440, 1.0)
signal[:100] *= np.linspace(0, 1, 100) # Add attack
# Detect onsets
onset = aubio.onset("default", 512, 256, 44100)
onsets = []
for frame in signal.reshape(-1, 256):
fvec = aubio.fvec(256)
fvec[:] = frame
if onset(fvec):
onsets.append(onset.get_last())
# Verify at least one onset detected at start
assert len(onsets) >= 1
assert onsets[0] < 1000 # Within first 1000 samples
def test_pitch_detection_workflow():
"""Test complete pitch detection workflow."""
signal = generate_sine_wave(440, 1.0)
pitch_o = aubio.pitch("default", 2048, 512, 44100)
pitch_o.set_unit("Hz")
pitches = []
for frame in signal.reshape(-1, 512):
fvec = aubio.fvec(512)
fvec[:] = frame
detected = pitch_o(fvec)[0]
if detected > 0:
pitches.append(detected)
# Verify average detected pitch is close to 440 Hz
avg_pitch = np.mean(pitches)
assert 430 < avg_pitch < 450, f"Expected ~440 Hz, got {avg_pitch}"Before:
- C tests: 45/45 passing (but CI ignores failures)
- Python tests: Run with
|| true - Performance tracking: None
- Fuzz testing: None
- Integration tests: None
- CI enforcement: Weak
After:
- C tests: All passing, CI blocks on failure
- Python tests: All passing, CI blocks on failure
- Performance tracking: Automated benchmarks on every PR
- Fuzz testing: Continuous fuzzing in CI, OSS-Fuzz integration
- Integration tests: 10+ real-world scenarios
- CI enforcement: Strong (tests are required checks)
- Coverage: >80% (measured with gcov/lcov)
- Google Benchmark: https://github.com/google/benchmark
- libFuzzer: https://llvm.org/docs/LibFuzzer.html
- OSS-Fuzz: https://google.github.io/oss-fuzz/
- pytest best practices: https://docs.pytest.org/en/stable/goodpractices.html
Estimated ROI: MEDIUM - Prevents bugs, improves maintainability
Effort: 3-4 days
Impact: Code quality, security, maintainability
The codebase has basic security hardening but lacks comprehensive static analysis and code quality tools:
Current State:
- ✅ Security compiler flags (-fstack-protector-strong, -D_FORTIFY_SOURCE=2)
- ✅ CodeQL scanning enabled
- ✅ Sanitizer testing (ASAN, UBSAN)
- ❌ No clang-tidy integration
- ❌ No cppcheck or other static analyzers
- ❌ No code coverage tracking
- ❌ No complexity metrics
- ❌ Limited compiler warning coverage
Pain Points:
- No code coverage metrics: Can't track test coverage improvements
- Manual code review burden: No automated checks for common patterns
- Inconsistent code style: No formatter or style checker
- Missing best practices: No linting for security patterns
- Technical debt invisible: No complexity or maintainability metrics
Option A: Clang-Tidy
- Part of LLVM toolchain
- C/C++ focused
- Configurable checks
- Good Meson integration
Checks to Enable:
clang-analyzer-*- Core static analysisbugprone-*- Bug-prone patternscppcoreguidelines-*- C++ Core Guidelinesreadability-*- Code readabilityperformance-*- Performance issuescert-*- CERT secure coding rules
Option B: Cppcheck
- Focused on C/C++
- Zero false-positive goal
- Lightweight
- Good for CI
Option C: PVS-Studio
- Commercial (free for open source)
- Very thorough
- Low false-positive rate
- Requires registration
Recommendation: Start with Clang-Tidy + Cppcheck (both free, complementary)
Tools:
- gcov/lcov - Traditional, well-supported
- llvm-cov - Modern, better integration with Clang
- Codecov.io - Online dashboard, PR comments
- Coveralls - Alternative to Codecov
Integration Points:
- Compile with
--coverageflags - Run test suite
- Generate coverage reports
- Upload to Codecov
- Enforce minimum coverage in CI
Target Coverage:
- Overall: >80%
- New code: >90%
- Critical paths (audio I/O, DSP): >95%
Options:
- clang-format: Industry standard for C/C++
- uncrustify: Highly configurable
- astyle: Simple, legacy
Current Issue: No consistent style enforcement
Solution:
- Add
.clang-formatconfiguration - Add pre-commit hook
- Add CI check
- Run once to reformat codebase (separate PR)
Metrics to Track:
- Cyclomatic complexity (functions >15 are suspect)
- Function length (>100 lines is concerning)
- Nesting depth (>4 levels is hard to understand)
- Maintainability index
Tools:
- lizard - Multi-language code complexity analyzer
- sloccount - Line count and effort estimation
- SonarQube - Comprehensive code quality platform
Phase 1: Static Analysis Integration (2 days)
- Add clang-tidy configuration
- Run on codebase, fix critical issues
- Add to CI (start with warnings only)
- Gradually increase strictness
Phase 2: Code Coverage (1 day)
- Enable coverage compilation in Meson
- Integrate with Codecov.io
- Add coverage badge to README
- Set baseline, track improvements
Phase 3: Code Formatting (1 day)
- Add clang-format configuration
- Format codebase (one-time)
- Add pre-commit hook
- Add CI enforcement
Create: .clang-tidy
---
Checks: >
clang-analyzer-*,
bugprone-*,
cert-*,
readability-*,
performance-*,
-readability-magic-numbers,
-cert-err33-c,
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling
WarningsAsErrors: '' # Start permissive, tighten later
CheckOptions:
- key: readability-identifier-naming.FunctionCase
value: lower_case
- key: readability-identifier-naming.VariableCase
value: lower_case
- key: readability-identifier-naming.ConstantCase
value: UPPER_CASEAdd to Meson:
# meson.build
clang_tidy = find_program('clang-tidy', required: false)
if clang_tidy.found() and get_option('clang_tidy')
run_target('clang-tidy',
command: [clang_tidy, '-p', meson.build_root()] + aubio_sources
)
endifAdd CI Job:
static-analysis:
name: Static Analysis (clang-tidy)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install tools
run: sudo apt-get install -y clang-tidy
- name: Setup build
run: meson setup builddir
- name: Run clang-tidy
run: |
ninja -C builddir clang-tidy 2>&1 | tee clang-tidy.log
# Fail if errors found (not warnings)
! grep "error:" clang-tidy.logModify: meson.build
if get_option('b_coverage')
add_project_arguments('-fprofile-arcs', '-ftest-coverage', language: 'c')
add_project_link_arguments('-lgcov', language: 'c')
endifAdd CI Job:
coverage:
name: Code Coverage
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: |
sudo apt-get install -y lcov
pip install meson ninja numpy
- name: Build with coverage
run: |
meson setup builddir -Db_coverage=true -Dtests=true
meson compile -C builddir
- name: Run tests
run: meson test -C builddir
- name: Generate coverage
run: ninja -C builddir coverage-html
- name: Upload to Codecov
uses: codecov/codecov-action@v4
with:
files: builddir/meson-logs/coverage.xml
fail_ci_if_error: trueAdd badge to README:
[](https://codecov.io/gh/LedFx/aubio-ledfx)Create: .clang-format
---
BasedOnStyle: LLVM
IndentWidth: 2
ColumnLimit: 80
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: Never
BreakBeforeBraces: Linux
IndentCaseLabels: false
PointerAlignment: Left
SpaceAfterCStyleCast: trueAdd pre-commit hook:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v18.1.0
hooks:
- id: clang-format
types_or: [c, c++]Add CI check:
format-check:
name: Code Formatting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check formatting
uses: jidicula/clang-format-action@v4.11.0
with:
clang-format-version: '18'
check-path: 'src'Before:
- Static analysis: CodeQL only
- Code coverage: Unknown
- Code formatting: Inconsistent
- CI checks: Basic
- Technical debt: Unknown
After:
- Static analysis: Clang-tidy + Cppcheck + CodeQL
- Code coverage: >80%, tracked via Codecov
- Code formatting: Consistent, enforced by CI
- CI checks: Comprehensive (build, test, lint, format, coverage)
- Technical debt: Visible and tracked
- Clang-Tidy: https://clang.llvm.org/extra/clang-tidy/
- Codecov: https://docs.codecov.com/docs
- clang-format: https://clang.llvm.org/docs/ClangFormat.html
Estimated ROI: MEDIUM - Improves contributor onboarding
Effort: 3-5 days
Impact: Contributor experience, project sustainability
The project has good README and basic documentation, but lacks comprehensive developer guides and modern documentation infrastructure:
Current State:
- ✅ Good README with build instructions
- ✅ Sphinx documentation for Python API
- ✅ Doxygen documentation for C API
- ❌ No contributor guide
- ❌ No architecture documentation
- ❌ No debugging guide
- ❌ Build documentation scattered
- ❌ No API design rationale
Pain Points:
- Onboarding difficulty: New contributors struggle to understand codebase
- Scattered documentation: Build, testing, vcpkg info in multiple places
- No debugging guide: Hard to troubleshoot build/test issues
- API documentation incomplete: Many functions lack detailed docs
- No examples: Limited code examples for common tasks
Current Documentation:
README.md- 215 lines, build instructionsdoc/- 40+ RST files for Sphinxpython/README.md- Python-specific info- Various
*.mdfiles - Security, implementation plans
Gaps:
- No CONTRIBUTING.md
- No ARCHITECTURE.md
- No DEBUGGING.md
- No RELEASE.md
- No examples directory with tutorials
Sample Analysis of Current State:
// Example from src/pitch/pitch.h
aubio_pitch_t * new_aubio_pitch (const char_t * method,
uint_t buf_size, uint_t hop_size, uint_t samplerate);
// Has basic comment, but missing:
// - List of available methods
// - Valid ranges for parameters
// - Return value details (NULL on error?)
// - Example usage
// - Performance characteristicsImprovement Areas:
- Add parameter validation documentation
- Document error conditions
- Add usage examples
- Cross-reference related functions
Goal: Lower barrier to entry with runnable examples
Options:
- Jupyter notebooks - Interactive Python examples
- Example programs - C examples in
examples/ - Online playground - Web-based demo (ambitious)
Priority Examples to Create:
- Basic onset detection
- Pitch tracking
- Tempo detection
- Audio file processing workflow
- Real-time audio processing
Phase 1: Essential Contributor Docs (2 days)
- Create CONTRIBUTING.md
- Create ARCHITECTURE.md
- Update README with better quick start
- Add DEBUGGING.md for troubleshooting
Phase 2: API Documentation Enhancement (1-2 days)
- Audit all public APIs
- Add missing parameter documentation
- Add error condition documentation
- Create API reference guide
Phase 3: Examples and Tutorials (1-2 days)
- Create Jupyter notebooks for Python
- Expand C examples
- Add to documentation
- Test examples in CI
Create: CONTRIBUTING.md
# Contributing to aubio-ledfx
Thank you for your interest in contributing to aubio-ledfx!
## Quick Start
1. **Fork and clone:**
```bash
git clone https://github.com/YOUR_USERNAME/aubio-ledfx.git
cd aubio-ledfx-
Set up development environment:
# Install dependencies pip install meson ninja numpy pytest # Configure and build meson setup builddir -Dtests=true -Dexamples=true meson compile -C builddir # Run tests meson test -C builddir
-
Make changes and test:
# Edit code vim src/... # Rebuild meson compile -C builddir # Test meson test -C builddir # Run specific test ./builddir/tests/test-onset
-
Submit PR:
- Create feature branch
- Make atomic commits
- Write tests
- Update documentation
- Submit PR with description
- C code: Follow existing style (clang-format enforced)
- Python code: PEP 8 (black formatter)
- Commit messages: Conventional Commits format
- All new code must have tests
- C tests in
tests/src/ - Python tests in
python/tests/ - Run sanitizers:
meson setup builddir -Db_sanitize=address,undefined
- Update relevant .rst files in
doc/ - Add docstrings to Python code
- Add Doxygen comments to C functions
- Automated checks must pass (CI, tests, linting)
- Code review by maintainer
- Merge when approved
- Open an issue for questions
- Check existing documentation in
doc/ - See DEBUGGING.md for troubleshooting
#### Step 2: Create ARCHITECTURE.md
**Create:** `ARCHITECTURE.md`
```markdown
# aubio-ledfx Architecture
## Overview
aubio-ledfx is a C library with Python bindings for audio analysis.
## Project Structure
aubio-ledfx/ ├── src/ # C library source │ ├── aubio.h # Main public API header │ ├── aubio_priv.h # Private/internal header │ ├── mathutils.c # Math utilities │ ├── fvec.c # Float vector operations │ ├── cvec.c # Complex vector operations │ ├── spectral/ # Spectral analysis │ ├── pitch/ # Pitch detection algorithms │ ├── tempo/ # Tempo and beat tracking │ ├── onset/ # Onset detection │ └── io/ # Audio I/O (file, device) ├── python/ # Python bindings │ ├── ext/ # C extension module │ ├── lib/ # Pure Python code │ └── tests/ # Python test suite ├── tests/ # C test suite ├── examples/ # Example programs └── doc/ # Documentation (Sphinx + Doxygen)
## Core Concepts
### Data Types
- **fvec_t:** Float vector (real-valued signals)
- **cvec_t:** Complex vector (frequency domain)
- **fmat_t:** Float matrix (multi-channel)
### Processing Pipeline
Audio File → Source → fvec_t → Analysis → Results ↓ FFT/PVOC → cvec_t → Spectral Analysis
### Object-Oriented C Pattern
```c
// Creation
aubio_onset_t* o = new_aubio_onset("default", 512, 256, 44100);
// Processing
aubio_onset_do(o, input_fvec, output_fvec);
// Destruction
del_aubio_onset(o);
- Meson: Build configuration
- vcpkg: Dependency management
- meson-python: Python package build
See doc/meson_reference.rst for details.
- fftw3f (recommended)
- Accelerate (macOS)
- Intel IPP (optional)
- ooura (fallback, always available)
aubio is NOT thread-safe. Each thread must have its own objects.
- All
new_*functions allocate, return pointer or NULL on failure - All
del_*functions free, safe to call with NULL - No garbage collection, manual management required
#### Step 3: API Documentation Template
**Create documentation script:**
```python
#!/usr/bin/env python3
"""Audit API documentation completeness."""
import re
from pathlib import Path
def check_function_doc(filepath, function_name, comment_block):
"""Check if function has complete documentation."""
issues = []
# Check for parameter documentation
if '@param' not in comment_block:
issues.append("Missing @param documentation")
# Check for return documentation
if '@return' not in comment_block:
issues.append("Missing @return documentation")
# Check for example
if '@example' not in comment_block and 'new_' in function_name:
issues.append("Constructor missing usage example")
return issues
# Run on all headers...
Create: examples/notebooks/01_onset_detection.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Onset Detection with aubio\n",
"\n",
"This notebook demonstrates onset detection using aubio-ledfx."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import aubio\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Generate test signal\n",
"samplerate = 44100\n",
"duration = 1.0\n",
"\n",
"# Create onset detector\n",
"win_s = 512\n",
"hop_s = 256\n",
"onset = aubio.onset(\"default\", win_s, hop_s, samplerate)\n",
"\n",
"# Process audio...\n",
"# (full example)"
]
}
]
}Before:
- Contributor onboarding: 2-3 days
- Documentation coverage: 40%
- Code examples: Minimal
- API reference: Incomplete
After:
- Contributor onboarding: <4 hours
- Documentation coverage: >80%
- Code examples: 10+ working examples
- API reference: Complete with examples
- Jupyter notebooks: 5+ tutorials
- Week 1: CI/CD optimization (Priority 1, Phase 1)
- Week 2: Test infrastructure fixes (Priority 3, Phase 1)
- Week 3: Static analysis integration (Priority 4, Phase 1)
- Week 4: Documentation essentials (Priority 5, Phase 1)
- Week 5-6: CI/CD dependency optimization (Priority 1, Phase 2)
- Week 7: Test benchmarking and fuzzing (Priority 3, Phases 2-3)
- Week 8: Python bindings analysis (Priority 2, Phases 1-3)
- Week 9-10: Python binding improvements or migration (Priority 2, Phase 4)
- Week 11: Code coverage and quality metrics (Priority 4, Phases 2-3)
- Week 12: Documentation and examples (Priority 5, Phases 2-3)
- Monitor CI performance metrics
- Track code coverage trends
- Review static analysis findings
- Update documentation as code evolves
| Metric | Baseline | Target | Timeframe |
|---|---|---|---|
| CI build time | 90-120 min | 35-50 min | Month 1 |
| Test pass rate | Unknown (ignored) | 100% | Month 1 |
| Code coverage | Unknown | >80% | Month 2 |
| Static analysis issues | Unknown | <10 high | Month 1 |
| Documentation coverage | ~40% | >80% | Month 3 |
| Contributor onboarding | 2-3 days | <4 hours | Month 3 |
| Fuzz testing | None | Continuous | Month 2 |
| Performance benchmarks | None | Tracked | Month 2 |
Before merging PRs:
- ✅ All tests pass
- ✅ Code coverage ≥ baseline (no regression)
- ✅ Static analysis passes
- ✅ Code formatted with clang-format
- ✅ Documentation updated
-
Python binding migration (if pursued)
- Risk: Breaking changes for users
- Mitigation: Gradual migration, maintain compatibility layer
-
Test failures in CI
- Risk: Unknown failures when removing
|| true - Mitigation: Fix all tests locally first
- Risk: Unknown failures when removing
-
Performance regression from optimization
- Risk: CI optimizations slow down actual builds
- Mitigation: Benchmark at each step
-
vcpkg cache size
- Risk: Exceeding GitHub's 10GB cache limit
- Mitigation: Monitor, implement LRU eviction
-
Fuzz testing resource usage
- Risk: Fuzzing consumes too much CI time/resources
- Mitigation: Run on schedule, not every PR
These 5 priorities represent the highest-value optimizations for aubio-ledfx:
- CI/CD Performance - Immediate developer productivity impact
- Python Modernization - Long-term maintainability and user experience
- Test Infrastructure - Code quality and confidence
- Static Analysis - Proactive bug prevention
- Documentation - Contributor growth and project sustainability
Next Steps:
- Review this roadmap with maintainers
- Prioritize based on team capacity
- Create GitHub issues for each priority
- Begin implementation in priority order
- Track progress with KPIs
Estimated Total Effort: 18-27 days (spread over 3 months) Estimated ROI: 3-5x in reduced maintenance burden and faster iteration
Document Maintenance:
- Review quarterly
- Update based on progress
- Add new priorities as they emerge
- Archive completed items
Last Updated: 2025-11-14