A Python package for detecting nodding behavior in videos using MediaPipe. This tool analyzes head movements and identifies nodding patterns by tracking 3D face and pose landmarks in video files.
-
Visualization with Rerun - When enabled with the
--visualizeflag, the pipeline will launch a Rerun viewer that shows:- The input video stream
- Per-frame pitch angle of the head
- Smoothed pitch angle of the head
- Nod detection status
- Nod counter
-
Command-line Interface - Easy to use with helpful command-line options
-
Sphinx Documentation - Professional and comprehensive documentation
-
Pre-commit Hooks - Ensuring code quality before each commit
-
Unit Testing with Pytest - Robust test coverage for reliable code
- 3D Face Landmark Detection: Tracks 478 facial landmarks in 3D space
- Pose Estimation: Detects 33 pose landmarks for full body tracking
- Head Pose Estimation: Calculates pitch, yaw, and roll angles of the head
- Real-time Visualization: Visualizes landmarks and head pose in real-time
- JSON Export: Saves detection results in a structured JSON format
- Modular Design: Easy to extend with custom detection algorithms
- Python 3.8+ - Modern Python features and type hints
- MediaPipe - For accurate face and pose landmark detection
- OpenCV - Video processing and visualization
- Rerun - Interactive visualization for debugging and analysis
- Typer - Command-line interface framework
- Rich - Rich text and console output
- Black - Code formatting
- isort - Import sorting
- Flake8 - Linting
- Mypy - Static type checking
- Pytest - Robust test coverage for reliable code
- Docker - Containerized environment for consistent execution
- Sphinx - Professional and comprehensive documentation
- Github Actions - Continuous integration and delivery
- Python 3.8 or higher
# Clone the repository
git clone https://github.com/kkaytekin/nod_detector.git
cd nod_detector
# Create and activate a virtual environment (recommended)
python -m venv venv
.\venv\Scripts\activate # On Windows
# On Unix/macOS: source venv/bin/activate
# Install the package in development mode with all dependencies
pip install -e .
# For development, install with additional development dependencies
# pip install -e ".[dev]"Process a video file with a single command:
# Show all available options
python -m nod_detector --help
# Process a video file with visualization
python -m nod_detector --input path/to/input_video.mp4 --visualize
# Save results to a specific directory
python -m nod_detector --input path/to/input_video.mp4 --output_dir results/Docker provides a consistent environment that works across all platforms, ensuring the application runs the same way everywhere.
- Docker Desktop installed and running
- At least 4GB of RAM allocated to Docker (recommended)
-
Build the Docker image (only needed once or after code changes):
docker compose build
-
Process a video file:
# Copy your video to the input directory cp path/to/your/video.mp4 data/input/ # Process the video with visualization docker compose run --rm nod-detector python -m nod_detector /data/input/video.mp4 -o /data/output/result.mp4 -v
The processed video will be saved to
data/output/on your host machine.
-
View help:
docker compose run --rm nod-detector python -m nod_detector --help
-
Process a video (replace placeholders as needed):
docker compose run --rm nod-detector python -m nod_detector \ /data/input/input.mp4 \ -o /data/output/result.mp4 \ -v # Enable visualization -
Debug mode (process only first 10 frames):
docker compose run --rm nod-detector python -m nod_detector /data/input/input.mp4 -d
- The
data/inputanddata/outputdirectories are mounted as volumes, so files persist between container runs - The container includes all necessary system dependencies for MediaPipe and OpenCV
- Use
--rmflag to automatically clean up the container after it exits
The nod detector generates individual JSON files for each processed frame along with a processed video file. The output directory structure is as follows:
output/
βββ <input_video_name>_processed.mp4 # Processed video with visualizations
βββ frame_000000.json # Frame 0 data
βββ frame_000001.json # Frame 1 data
βββ frame_000002.json # Frame 2 data
βββ ... # And so on for each frame
Each frame_XXXXXX.json file contains the following structure:
{
"frame_number": 0, // Frame number (0-based index)
"timestamp": 0.0, // Timestamp in seconds
"detections": [], // List of detections (if any)
"head_pose": { // Head pose information
"pitch": -51.42551400204901, // Head pitch angle in degrees
"yaw": 0.0, // Head yaw angle in degrees
"roll": 0.0 // Head roll angle in degrees
},
"pose_landmarks": { // 3D pose landmarks
"0": { // Landmark ID (0-32 for body pose)
"x": 0.4661392569541931, // X coordinate (normalized)
"y": 0.28261396288871765, // Y coordinate (normalized)
"z": -0.39765465259552, // Z coordinate (normalized)
"visibility": 0.990135908126831 // Visibility score [0, 1]
},
// ... more landmarks (0-32 for body pose)
},
"face_landmarks": { // 3D face landmarks (if detected)
"0": { // Landmark ID (0-467 for face)
"x": 0.5, // X coordinate (normalized)
"y": 0.3, // Y coordinate (normalized)
"z": -0.4, // Z coordinate (normalized)
"visibility": 0.99 // Visibility score [0, 1]
},
// ... more face landmarks (0-467)
},
"nod_detected": false // Whether a nod was detected in this frame
}The <input_video_name>_processed.mp4 file contains the original video with the following visualizations overlaid:
- 3D pose landmarks (body keypoints)
- Face mesh (if face detection is enabled)
-
frame_XXXXXX.jsonfiles: Individual JSON files for each frame containing:- Frame metadata (number, timestamp)
- Head pose angles (pitch, yaw, roll)
- 3D pose landmarks (body keypoints)
- 3D face landmarks (if detected)
- Nod detection status
-
<input_video_name>_processed.mp4: The processed video with visualizations overlaid
- All coordinates are normalized to [0, 1] range where (0,0) is the top-left corner and (1,1) is the bottom-right corner of the frame
- Negative Z values indicate points that are closer to the camera
- Visibility scores indicate the confidence of each landmark detection [0, 1]
- Head pose angles are in degrees, where:
- Positive pitch: Looking up
- Negative pitch: Looking down
- Positive yaw: Turning right
- Negative yaw: Turning left
- Positive roll: Tilt right
- Negative roll: Tilt left
nod_detector/
βββ src/
β βββ nod_detector/ # Main package
β βββ pipeline/ # Video processing pipeline
β β βββ video_processing_pipeline.py # Main processing logic
β βββ mediapipe_components.py # MediaPipe integration
β βββ nod_detection.py # Nod detection algorithm
β βββ output_utils.py # Output handling and visualization
β βββ main.py # Command-line interface
β βββ __init__.py # Package definition
βββ tests/ # Test suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
β βββ data/ # Test video files
βββ examples/ # Example scripts
βββ Dockerfile # Docker configuration
βββ requirements.txt # Python dependencies
βββ setup.py # Package configuration
βββ pyproject.toml # Build system configuration
- The Rerun viewer will open automatically when you run the detector with the
--visualizeflag. - Use the timeline at the bottom to scrub through frames.
- Toggle different visualizations using the right sidebar.
- The 3D view shows the estimated fully body and head pose of the subject, projected onto the input video stream.
- The input video should contain clear frontal or near-frontal views of faces
- Lighting conditions should be sufficient for face detection
- The subject's head should be visible for most of the video duration
- Optimal detection occurs when the subject is within 1-2 meters from the camera
- The system works best with videos that have a minimum resolution of 640x480
- Frame rates between 24-60 FPS are recommended for best results
- Multiple faces in the frame may affect detection accuracy
- Performance may vary with different lighting conditions and video qualities
Comprehensive documentation is available on GitHub Pages. The documentation includes:
- Installation and setup instructions
- Usage examples and API reference
- Development guidelines
- Troubleshooting
- and more!
To build the documentation locally:
# Install documentation dependencies
pip install -e ".[docs]"
# Build the documentation
cd docs
make html
# Open the built documentation in your browser
start _build/html/index.html # On Windows
# On macOS: open _build/html/index.html
# On Linux: xdg-open _build/html/index.htmlFound a bug or have a feature request? Please open an issue on our GitHub Issues page.
- Optimize memory usage to enable long video processing
- Improve the algorithm to work on multi-person videos
- Improve nod detection algorithm, also consider yaw and roll angles for better nod detection
- Implement light-weight version with downscaled video resolution
- Review documentation
- Clear code base of unused code and unit tests
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Set up pre-commit hooks (see below)
- Make your changes and commit them
- Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install the package in development mode with all dependencies
pip install -e ".[dev]"
# Install pre-commit hooks (runs automatically on git commit)
pre-commit install
# Run tests
pytest
# Format code with black
black .
# Check code style with flake8
flake8 # Configured to exclude .venv and other common directories
# Run type checking with mypy
mypy src/This project uses pre-commit to run several code quality checks before each commit. The following hooks are configured:
- Black: Code formatting
- isort: Import sorting
- Flake8: Linting
- Mypy: Static type checking
- Pre-commit hooks: Various checks for common issues
These hooks run automatically when you make a commit. If any checks fail, the commit will be aborted and you'll need to fix the issues before committing.
- Write tests for new features or bug fixes
- Implement the feature/fix to make the tests pass
- Run tests and fix any issues (
pytest) - Format your code (
black .) - Check for code style issues (
flake8) - Run type checking (
mypy src/) - Stage and commit your changes - pre-commit hooks will run automatically
- Push to your fork and open a pull request
The project includes a comprehensive test suite to ensure reliability and maintainability.
# Run all tests
pytest
# Run only unit tests
pytest tests/unit/
# Run only integration tests
pytest tests/integration/ -m integration
# Run tests with coverage report
pytest --cov=src/nod_detector --cov-report=term-missingtests/unit/: Unit tests for individual componentstests/integration/: Integration tests that verify the system as a wholetests/data/: Test data and fixtures
.flake8: Flake8 configuration (excludes, line length, etc.)pyproject.toml: Configuration for various tools (black, isort, mypy, pytest).pre-commit-config.yaml: Pre-commit hooks configuration
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with MediaPipe
- Inspired by research in computer vision and behavior analysis
- Prexels.com for the video used for testing and visualization
