WhatsApp archaeology with encrypted receipts.
Read-only local archive and search for the macOS WhatsApp Desktop app.
wacrawl copies WhatsApp Desktop's local SQLite databases into a temporary
snapshot, imports the useful chat data into its own SQLite archive, and gives
you scriptable commands for status, chat listing, message listing, and full-text
search.
It is for local inspection. It does not send messages, decrypt backups, talk to WhatsApp Web, or write back into WhatsApp's app container.
Homebrew is the easiest path. Install directly from my tap:
brew install steipete/tap/wacrawlAfter that, upgrades stay simple:
brew update
brew upgrade steipete/tap/wacrawlOr from source:
go install github.com/steipete/wacrawl/cmd/wacrawl@latestCheck the installed binary:
wacrawl --versionFirst, check whether wacrawl can see the local WhatsApp Desktop data:
wacrawl doctorSync a fresh local archive:
wacrawl syncInspect what was imported. Read commands sync automatically by default, so
status, chats, messages, and search refresh the archive before reading
when the local WhatsApp Desktop source is newer:
wacrawl status
wacrawl chats --limit 20
wacrawl messages --limit 20Search message text:
wacrawl search "release notes"Use JSON for scripts:
wacrawl --json search "invoice" --from-them --after 2026-01-01On macOS, WhatsApp Desktop stores app data in:
~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared
wacrawl currently imports from:
ChatStorage.sqlite
ContactsV2.sqlite
Message/Media/
It writes its own archive to:
~/.wacrawl/wacrawl.db
Override either path when needed:
wacrawl --source "$HOME/Library/Group Containers/group.net.whatsapp.WhatsApp.shared" doctor
wacrawl --db /tmp/wacrawl.db import- Opens WhatsApp data read-only.
- Copies SQLite database, WAL, and SHM files into a temp snapshot before import.
- Replaces only the
wacrawlarchive database. - Does not modify WhatsApp databases, settings, contacts, chats, or media.
- Does not use the WhatsApp network protocol.
- Does not upload data during normal archive/search commands.
backup pushuploads only age-encrypted backup shards when you explicitly run it.
The archive can contain private message data. Keep ~/.wacrawl/wacrawl.db
local and out of commits, backups, and shared logs unless that is intentional.
Inspect the source path and database shape:
wacrawl doctor
wacrawl --json doctorReports source availability, discovered database files, row counts, message date range, and importer schema notes.
Snapshot WhatsApp Desktop data and replace the local archive in one transaction:
wacrawl importsync is the same command with a clearer name:
wacrawl syncImports:
- chats
- contacts
- groups
- group participants
- messages
- media metadata and local media paths
Show archive counts and import metadata:
wacrawl statusIncludes chat, contact, group, participant, message, media-message, oldest, newest, last-import, and source fields.
By default, status first syncs the archive when the last sync is older than
--sync-max-age and the WhatsApp Desktop source has newer data.
List chats ordered by newest message:
wacrawl chats
wacrawl chats --limit 100List archived messages:
wacrawl messages
wacrawl messages --chat 1234567890@s.whatsapp.net
wacrawl messages --after 2026-01-01 --from-them
wacrawl messages --has-media --jsonFilters:
--chat JID Restrict to one chat.
--sender JID Restrict to one sender.
--limit N Max rows. Default: 50.
--after DATE RFC3339 timestamp or YYYY-MM-DD.
--before DATE RFC3339 timestamp or YYYY-MM-DD.
--from-me Only outgoing messages.
--from-them Only incoming messages.
--has-media Only messages with media metadata.
--asc Oldest first.
Search the archive with SQLite FTS5:
wacrawl search "launch"
wacrawl search "invoice" --from-them --after 2026-01-01
wacrawl --json search "restaurant"Search uses message text, chat name, sender name, and media title fields. It
accepts the same filters as messages.
wacrawl keeps normal reads fresh without a daemon or background service.
Before status, chats, messages, or search, it checks the archive's
last import time. If the archive is stale, it inspects the WhatsApp Desktop
source and imports a fresh snapshot only when the source is ahead.
The default policy is:
--sync auto
--sync-max-age 15m
Sync modes:
--sync auto Sync before reads when the archive is stale and source is ahead.
--sync always Force a sync before every read command.
--sync never Read only the existing archive.
Examples:
wacrawl search "release notes"
wacrawl --sync always status
wacrawl --sync never --json messages --limit 10
wacrawl --sync-max-age 1h chatsIf the WhatsApp Desktop source is unavailable and the archive already has data,
--sync auto warns on stderr and continues with the existing archive.
--sync always treats an unavailable source as an error.
wacrawl can back up the archive to a Git repository using age-encrypted JSONL
shards. This is meant for a private repository such as
https://github.com/steipete/backup-wacrawl, but the message data is encrypted
before Git sees it.
The backup repo contains:
README.md
manifest.json
data/chats.jsonl.gz.age
data/contacts.jsonl.gz.age
data/groups.jsonl.gz.age
data/group_participants.jsonl.gz.age
data/messages/YYYY/MM.jsonl.gz.age
manifest.json is intentionally cleartext so a machine can inspect backup
freshness, public age recipients, counts, shard paths, encrypted byte sizes, and
plaintext hashes without decrypting message contents. It does not contain
message text, chat names, contacts, participant IDs, or media metadata. Those
fields live inside the *.jsonl.gz.age shards.
Use these most of the time:
# First-time setup on a machine.
wacrawl backup init \
--repo ~/Projects/backup-wacrawl \
--remote https://github.com/steipete/backup-wacrawl.git
# Refresh WhatsApp data if needed, encrypt, commit, and push to GitHub.
wacrawl backup push
# Pull the Git backup, decrypt, verify, and import into the local archive.
wacrawl backup pull
# Inspect the backup manifest without decrypting message data.
wacrawl backup statusUseful safety variants:
# Force a fresh WhatsApp import before writing the backup.
wacrawl --sync always backup push
# Write and commit locally, but do not push to GitHub.
wacrawl backup push --no-push
# Restore into a throwaway database for testing.
wacrawl --db /tmp/wacrawl-restore-test.db backup pull
wacrawl --db /tmp/wacrawl-restore-test.db --sync never statusYou should not need to run git manually for normal use. backup push handles
the backup repo pull/rebase, commit, and push. backup pull handles the backup
repo pull/rebase before decrypting.
Backups use the Go filippo.io/age library with X25519 age identities. There
is no backup password. Each machine has an age identity file, usually:
~/.wacrawl/age.key
That file contains an AGE-SECRET-KEY-... private identity and is written with
0600 permissions. Its matching public recipient starts with age1... and is
safe to place in ~/.wacrawl/backup.json, manifest.json, or docs.
For each shard, wacrawl backup push:
- Exports rows from the local archive as deterministic JSONL.
- Gzip-compresses the JSONL with a fixed gzip timestamp.
- Encrypts the compressed bytes with age for every configured recipient.
- Writes only the encrypted
*.jsonl.gz.ageshard to Git. - Writes
manifest.jsonwith cleartext metadata used for status, diffing, and restore verification.
wacrawl backup pull does the reverse: it pulls/rebases the backup repo,
checks manifest shard paths, decrypts each shard with the local age identity,
verifies the shard hash, validates cross-table references, and imports the
snapshot into the configured archive database in one transaction.
What the backup protects:
- A GitHub read-only compromise or accidental clone does not reveal message text, contacts, chat names, participant IDs, or media metadata.
- Each encrypted shard can be decrypted by any listed age recipient, so multiple machines can share one backup without sharing one private key.
- Age provides encrypted-file integrity; corrupted or wrong-key shards fail to
decrypt, and
wacrawlalso checks manifest hashes after decrypting.
What remains visible in Git:
manifest.jsonis cleartext.- The manifest reveals export time, public recipients, table names, row counts, shard paths, encrypted byte sizes, and plaintext shard hashes.
- Message shard paths reveal activity by year and month, for example
data/messages/2026/04.jsonl.gz.age. - Git history reveals backup cadence and which encrypted shards changed.
Important limits:
- This is not end-to-end provenance. Someone who can push to the backup repo can replace the backup with different data encrypted to your public recipient. Use normal GitHub access control and review unexpected backup commits.
- If
~/.wacrawl/age.keyis lost and no other configured recipient exists, the encrypted backup cannot be restored. - If an age identity is compromised, remove its public recipient, run
wacrawl backup pushto re-encrypt current shards, and consider rewriting or deleting old Git history because older commits may still be decryptable with the compromised key. - X25519 age recipients are not post-quantum. They are a practical modern default, but not a post-quantum archival guarantee.
- The local archive database
~/.wacrawl/wacrawl.dband the WhatsApp Desktop source data remain plaintext on the machine. Protect the machine and local backups accordingly.
Initialize the backup repository and local age identity:
wacrawl backup init \
--repo ~/Projects/backup-wacrawl \
--remote https://github.com/steipete/backup-wacrawl.gitThis writes ~/.wacrawl/backup.json, creates ~/.wacrawl/age.key if needed,
clones or initializes the local backup checkout, and prints the public age
recipient.
The generated config looks like this:
{
"repo": "~/Projects/backup-wacrawl",
"remote": "https://github.com/steipete/backup-wacrawl.git",
"identity": "~/.wacrawl/age.key",
"recipients": ["age1..."]
}Keep ~/.wacrawl/age.key private. The public age1... recipient can be stored
in backup.json; the AGE-SECRET-KEY-... identity must stay local or in a
password manager.
Push an encrypted backup:
wacrawl backup pushbackup push first pulls/rebases the configured backup checkout, then uses the
normal read-time sync policy. With the default --sync auto --sync-max-age 15m,
it refreshes the local archive only when the WhatsApp Desktop source is stale
and newer than the archive. Then it exports stable JSONL, gzip-compresses each
shard, encrypts each shard for every configured recipient, updates
manifest.json, removes stale encrypted shards, commits, and pushes the backup
repo.
Re-running backup push without archive changes leaves Git clean. The command
prints the repo path, whether anything changed, whether the backup is encrypted,
the shard count, and the message count.
Use --no-push for local dry runs that commit into the backup checkout but do
not push to the remote:
wacrawl backup push --no-pushRestore from the backup repo:
wacrawl backup pullbackup pull pulls/rebases the configured backup repo, decrypts every shard with
the local age identity, verifies each plaintext shard hash from the manifest,
validates cross-table references, and replaces the configured wacrawl archive
database in one import transaction.
To test a restore without touching your real archive:
wacrawl --db /tmp/wacrawl-restore-test.db backup pull
wacrawl --db /tmp/wacrawl-restore-test.db --sync never statusInspect backup metadata:
wacrawl backup statusThis reports encryption status, shard count, message count, export timestamp,
and repo path. It reads manifest.json; it does not need to decrypt shards.
Each machine that should restore needs its own age identity. On the new machine:
wacrawl backup init \
--repo ~/Projects/backup-wacrawl \
--remote https://github.com/steipete/backup-wacrawl.gitCopy the printed public recipient (age1...) into the recipients list in
~/.wacrawl/backup.json on a machine that can already decrypt the backup, then
run:
wacrawl backup pushAfter that push, newly written shards are encrypted for all configured
recipients. If you added a recipient after data already existed, run a normal
wacrawl backup push; unchanged plaintext shards are re-encrypted when the
manifest/config changes.
For personal setup, storing a copy of ~/.wacrawl/age.key in 1Password is a
good recovery path. Do not commit the identity file. Do not paste the
AGE-SECRET-KEY-... value into issues, logs, docs, or chat.
Useful flags:
--config PATH Backup config path. Default: ~/.wacrawl/backup.json
--repo PATH Local backup Git checkout.
--remote URL Backup Git remote.
--identity PATH Local age identity. Default: ~/.wacrawl/age.key
--recipient AGE Public age recipient. Repeat for multiple machines.
--no-push Commit locally but do not push.
On a new Mac:
brew install steipete/tap/wacrawl
git clone https://github.com/steipete/backup-wacrawl.git ~/Projects/backup-wacrawl
mkdir -p ~/.wacrawlThen restore ~/.wacrawl/age.key from your password manager and create
~/.wacrawl/backup.json pointing at the clone:
{
"repo": "~/Projects/backup-wacrawl",
"remote": "https://github.com/steipete/backup-wacrawl.git",
"identity": "~/.wacrawl/age.key",
"recipients": ["age1..."]
}Finally:
wacrawl backup pull
wacrawl --sync never statusIf decryption fails, the local identity does not match any recipient used for
the encrypted shards. If Git push fails, fix normal GitHub permissions for the
backup repository; the archive data is already encrypted before the push.
--db PATH Archive database path. Default: ~/.wacrawl/wacrawl.db
--source PATH WhatsApp Desktop source path.
--sync MODE Read-time sync policy: auto, always, or never. Default: auto.
--sync-max-age DURATION Staleness window for --sync auto. Default: 15m.
--json Emit JSON instead of human-readable output.
--version Print the CLI version.
WhatsApp Desktop uses CoreData-style SQLite tables. The importer currently knows about:
ZWACHATSESSION
ZWAMESSAGE
ZWAMEDIAITEM
ZWAGROUPINFO
ZWAGROUPMEMBER
Important details:
- WhatsApp timestamps are seconds since
2001-01-01T00:00:00Z. ZWAMESSAGE.Z_PKis used as the source row identity.ZSTANZAIDis not unique enough for archive identity.- Group senders are resolved through
ZWAMESSAGE.ZGROUPMEMBER. - Media is joined through both
ZWAMESSAGE.ZMEDIAITEMandZWAMEDIAITEM.ZMESSAGE. - WhatsApp's own search database uses a custom
wa_tokenizer;wacrawlbuilds a portable FTS5 index instead.
Requires Go 1.26 or newer.
make checkRuns:
golangci-lint run ./...
./scripts/coverage.sh 85.0
go build -o bin/wacrawl ./cmd/wacrawlExtra release-parity checks:
go test -count=1 -race ./...
goreleaser release --snapshot --clean --skip=publishCoverage must stay at or above 85%.
Releases are tag-driven through GoReleaser.
git tag -a v0.2.0 -m "Release 0.2.0"
git push origin main --tagsCI publishes GitHub release artifacts for:
darwin/amd64
darwin/arm64
linux/amd64
linux/arm64
windows/amd64
windows/arm64
The Homebrew formula lives in:
~/Projects/homebrew-tap/Formula/wacrawl.rb
MIT. See LICENSE.