Recursive web crawler that downloads files from a website starting at a given URL path.
- Recursive crawling with configurable depth limit
- Concurrent downloads
- Never follows links outside the given domain
- Never navigates backwards in the URL path
- Skips files that already exist locally with the same size (via HEAD + Content-Length)
- Configurable delay between requests, User-Agent, and concurrency
.zip .tar.gz .tar.bz2 .tar.xz .tgz .rar .7z .pdf .tap .z80 .bin .dsk
- Go 1.26+
golangci-lint(for linting only)
make buildThe binary is placed in build/samcraw.
samcraw -url <URL> [options]| Flag | Default | Description |
|---|---|---|
-url |
(required) | Starting URL to crawl |
-output |
./downloads |
Output directory for downloaded files |
-depth |
10 |
Maximum crawl recursion depth |
-concurrency |
3 |
Number of simultaneous downloads |
-delay |
500ms |
Delay between HTTP requests |
-user-agent |
samcraw/1.0.0 |
User-Agent header for HTTP requests |
-version |
Show version and exit |
# Basic usage
samcraw -url https://example.com/files/
# Custom output directory and higher concurrency
samcraw -url https://example.com/files/ -output ./my-files -concurrency 5
# Limit depth and add delay
samcraw -url https://example.com/files/ -depth 3 -delay 1s# Run tests
make test
# Format code
make format
# Lint (staticcheck, govulncheck, golangci-lint)
make lint
# Build and run with example options
make runMIT