Skip to content

RQ-Radionics/samcraw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samcraw

Recursive web crawler that downloads files from a website starting at a given URL path.

Features

  • Recursive crawling with configurable depth limit
  • Concurrent downloads
  • Never follows links outside the given domain
  • Never navigates backwards in the URL path
  • Skips files that already exist locally with the same size (via HEAD + Content-Length)
  • Configurable delay between requests, User-Agent, and concurrency

Supported file types

.zip .tar.gz .tar.bz2 .tar.xz .tgz .rar .7z .pdf .tap .z80 .bin .dsk

Requirements

  • Go 1.26+
  • golangci-lint (for linting only)

Build

make build

The binary is placed in build/samcraw.

Usage

samcraw -url <URL> [options]

Options

Flag Default Description
-url (required) Starting URL to crawl
-output ./downloads Output directory for downloaded files
-depth 10 Maximum crawl recursion depth
-concurrency 3 Number of simultaneous downloads
-delay 500ms Delay between HTTP requests
-user-agent samcraw/1.0.0 User-Agent header for HTTP requests
-version Show version and exit

Examples

# Basic usage
samcraw -url https://example.com/files/

# Custom output directory and higher concurrency
samcraw -url https://example.com/files/ -output ./my-files -concurrency 5

# Limit depth and add delay
samcraw -url https://example.com/files/ -depth 3 -delay 1s

Development

# Run tests
make test

# Format code
make format

# Lint (staticcheck, govulncheck, golangci-lint)
make lint

# Build and run with example options
make run

License

MIT

About

Retro crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors