Skip to content

davidleonstr/dirscanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dirscanner

License Python 3.6+

A CLI utility written in Python designed to scan directories, filter files based on glob patterns, and aggregate their content into a structured JSON format. This tool is optimized for generating code contexts, performing project audits, or creating structured backups of text-based assets.

Features

  • Recursive Scanning: Deep-traversal of directory structures to capture all nested files.
  • Pattern Matching: Full support for Unix-style glob patterns (e.g., *.py, src/*.js) for precise file inclusion.
  • Dual Pattern Inputs: Define inclusion rules via command-line arguments or through external configuration files (similar to .gitignore syntax).
  • Safe Content Handling: Automatically handles file encoding; binary files or those with restricted permissions are marked as null to ensure process continuity.
  • Flexible Output: Supports direct file writing or piping results to standard output (stdout) for integration with other CLI tools.

Installation

  1. Ensure Python 3.6 or higher is installed on your system.
  2. Clone or download the project source code.
  3. No Dependencies: This project relies exclusively on the Python Standard Library. No external packages are required.

Usage

The basic command structure is:

python main.py <directory> [options]

Examples

1. Standard Scan (Output to Terminal):

python main.py ./folder

2. Filter by Specific Extensions:

python main.py ./folder --patterns "*.py" "*.md"

3. Export to JSON File:

python main.py ./folder -o snapshot.json

4. Advanced Filtering via Input File: If you have a file named include.txt with the following content:

# Source code
src/*.py
# Configuration
config/*.json

Run:

python main.py ./folder -i include.txt -o output.json

CLI Arguments Reference

Argument Short Description
directory - Required. The root directory path to scan.
--output -o Path to the destination JSON file.
--stdin - Forces the JSON output to print to the terminal.
--patterns -p List of inclusion patterns (e.g., *.py *.txt).
--input -i Path to a file containing patterns (one per line).

Output Format

The tool generates a JSON object where keys represent the relative file paths and values contain the file content:

{
    "src/main.py": "import json\n...",
    "docs/README.md": "# Project Documentation...",
    "assets/icon.png": null
}

Note: Files that cannot be read as UTF-8 text (such as images or compiled binaries) will have their value set to null within the JSON object.


License

This project is open-source and available under the MIT License.

About

A document scanner based on "Unix filename pattern matching" that reads the contents of documents and sorts them into JSON format.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages