A machine learning system for classifying drum sounds using a hybrid KNN approach with personalization capabilities.
| Approach | Description | Accuracy / ARI | Strengths | Limitations |
|---|---|---|---|---|
| Unsupervised | Clustering with K-Means, GMM, HDBSCAN, Hierarchical | ARI ≈ 0.11 | No labels needed, flexible discovery | Poor label alignment, low structure clarity |
| Seeded K-Means | K-Means initialized with one participant’s true centroids | ARI ≈ 0.13 | Adds structure, improves cluster quality | Still low accuracy, depends on selected participant |
| Supervised | Global KNN model trained on all users, tested on unseen participants | 57.0% Accuracy | Simple, interpretable | Weak generalization to new users |
| Hybrid | KNN with 5 user-specific samples + feature weighting via Ridge/Lasso | 83.8% Accuracy | Personalized, low data requirement | Requires minimal user input at runtime |
This project uses the combined AVP-LVT Vocal Percussion Dataset, which merges:
- AVP (Amateur Vocal Percussion) — 28 performers
- LVT (Live Vocalised Transcription) — 20 performers
Both datasets are preprocessed and included in the repository under the data/ directory. These include segmented audio clips, standardized labels, and annotation metadata for training, evaluation, and synthesis.
- Create a Python virtual environment:
python -m venv venv
source venv/bin/activate # On macOS/Linux
# or
.\venv\Scripts\activate # On Windows- Install required packages:
pip install -r requirements.txt-
Data Collection and Processing
- Run
datacollection.ipynbto create a master dataframe of all audio samples - Run
audiosegmentation.ipynbto segment audio into individual drum sounds - Run
segmentaugmentation.ipynbto create augmented segments for training
- Run
-
Feature Extraction
- Navigate to the
feature_extractionfolder - Run all three feature extraction notebooks:
mfcc_extraction.ipynbenvelope_features.ipynbfeature_selection.ipynb
- Navigate to the
-
Model Training
- Navigate to the
modelsfolder - Run the model files
- Navigate to the
-
Final Pipeline
- Navigate to
final_pipeline/code/ - Run
pipeline.ipynbto execute the complete system
- Navigate to
- Hybrid KNN model with personalization capabilities
- Feature extraction using MFCCs and envelope features
- Data augmentation for improved model robustness
- Ridge and Lasso regression for feature selection
- Visualization tools for model analysis
The system uses a hybrid KNN approach that:
- Combines base training data with user-specific examples
- Uses Ridge/Lasso regression to identify important features
- Implements weighted distance metrics for improved classification
- Adapts to individual playing styles