Skip to content

Add batch processing support for processing multiple files#276

Merged
chanshing merged 1 commit intomasterfrom
feat/batch-processing
Nov 9, 2025
Merged

Add batch processing support for processing multiple files#276
chanshing merged 1 commit intomasterfrom
feat/batch-processing

Conversation

@chanshing
Copy link
Member

Summary

Add comprehensive batch processing functionality to process multiple accelerometer files from a folder in a single command. This enables efficient processing of large datasets without manual scripting.

Key Features

  • Batch folder processing: Process all matching files in a directory
  • Smart file discovery: Filter by extension with --fileExtensions (required)
  • Recursive search: Optional --recursive flag for subdirectory traversal
  • Compression aware: Automatically matches .gz, .zip, .bz2, .xz variants
  • Error isolation: Individual file failures don't stop batch processing
  • Comprehensive reporting: Detailed summary with success/failure counts and timings
  • Data protection: Detects duplicate basenames to prevent output overwrites
  • Proper exit codes: Non-zero codes for CI/CD integration
  • Backward compatible: Single file mode unchanged

Changes

src/accelerometer/accProcess.py

  • Add matchesExtension() for file extension matching with compression support
  • Add discoverFiles() for file discovery with recursive option
  • Add processSingleFile() extracted from main() for reusability
  • Add processBatch() to orchestrate batch processing
  • Add --fileExtensions argument (required for batch mode)
  • Add --recursive argument for subdirectory search
  • Refactor main() to detect and route file vs directory input
  • Implement try-finally cleanup for intermediate files
  • Catch SystemExit to prevent device.py from killing batch
  • Detect duplicate basenames and exit with clear error
  • Exit with proper codes: 255 for setup errors, 1 for failures
  • Respect --verbose flag in batch error reporting

README.md

  • Add batch processing usage section with examples
  • Update output paths to reflect outputs/{filename}/ structure
  • Document new CLI arguments
  • Explain compression format auto-detection

Usage Examples

# Process all CWA files in a folder
accProcess data/accelerometer_files/ --fileExtensions cwa

# Process multiple formats
accProcess data/ --fileExtensions cwa,bin,csv

# Recursive search
accProcess data/ --fileExtensions cwa --recursive True

# Single file (unchanged)
accProcess data/sample.cwa.gz

Testing

  • ✅ Single file processing (backward compatibility)
  • ✅ Batch processing with multiple files
  • ✅ Extension filtering (single and multiple)
  • ✅ Recursive vs non-recursive search
  • ✅ Empty/malformed extensions handling
  • ✅ Duplicate basename detection
  • ✅ Error handling and exit codes
  • ✅ Cleanup on exception
  • ✅ SystemExit catching (from device.py)
  • ✅ Flake8 compliance
  • ✅ Code compilation

Exit Codes

  • 0: Success (all files processed successfully)
  • 1: Partial/complete failure (at least one file failed)
  • 255: Setup error (no files found, duplicates, invalid arguments)

Breaking Changes

None. Single file processing remains unchanged. Batch mode is opt-in via directory input.

Add comprehensive batch processing functionality to process multiple
accelerometer files from a folder in a single command.

Changes to src/accelerometer/accProcess.py:
- Add matchesExtension() to match file extensions with compression support
- Add discoverFiles() to discover files by extension with recursive option
- Add processSingleFile() extracted from main() for reusability
- Add processBatch() to orchestrate batch processing with error handling
- Add --fileExtensions argument (required for batch mode)
- Add --recursive argument for subdirectory search
- Refactor main() to detect and route file vs directory input
- Implement try-finally cleanup to ensure intermediate files deleted on error
- Catch SystemExit exceptions to prevent device.py sys.exit() from killing batch
- Detect duplicate basenames and exit with error to prevent data loss
- Exit with non-zero code when no files found or any files fail
- Respect --verbose flag for detailed error reporting in batch mode
- Remove unused atexit dependency (cleanup now in finally block)

Changes to README.md:
- Add batch processing usage section with examples
- Update output paths to reflect new outputs/{filename}/ structure
- Document --fileExtensions and --recursive options
- Explain compression format auto-detection

Key features:
- Serial processing with per-file error isolation
- Comprehensive batch summary with success/failure counts
- Exit codes: 255 for no files/duplicates, 1 for any failures
- Smart extension matching (case-insensitive, compression aware)
- Duplicate basename detection prevents output overwrites
- Backward compatible: single file mode unchanged
@chanshing chanshing merged commit d91c004 into master Nov 9, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant