Skip to content

Combine thomcc/anonymize-places and thomcc/mentat-places-test into a single tool, and add it#191

Closed
thomcc wants to merge 5 commits intomozilla:masterfrom
thomcc:places-mentat
Closed

Combine thomcc/anonymize-places and thomcc/mentat-places-test into a single tool, and add it#191
thomcc wants to merge 5 commits intomozilla:masterfrom
thomcc:places-mentat

Conversation

@thomcc
Copy link
Copy Markdown
Contributor

@thomcc thomcc commented Aug 20, 2018

As discussed in the meeting, this doesn't do anybody any good in my github and is better off here.

This code is not the most robust (yay, returning Result<T, failure::Error> everywhere), but worse things have happened.

Usage:

$ cd path/to/application-services/places-tool
$ cargo run -- help anonymize
places-tool-anonymize
Anonymize a places database

USAGE:
    places-tool anonymize [FLAGS] [ARGS]

FLAGS:
    -f, --force      Overwrite OUTPUT if it already exists
    -h, --help       Prints help information
    -v               Sets the level of verbosity (pass up to 3 times for more verbosity -- e.g. -vvv enables trace logs)
    -V, --version    Prints version information

ARGS:
    <OUTPUT>    Path where we should output the anonymized db (defaults to places_anonymized.sqlite)
    <PLACES>    Path to places.sqlite. If not provided, we'll use the largest places.sqlite in your firefox profiles

$ cargo run -- help to-mentat
places-tool-to-mentat
Convert a places database to a mentat database

USAGE:
    places-tool to-mentat [FLAGS] [ARGS]

FLAGS:
    -f, --force        Overwrite OUTPUT if it already exists
    -h, --help         Prints help information
    -r, --realistic    Insert everything with one transaction per visit. This is a lot slower, but is a more realistic
                       workload. It produces databases that are ~30% larger (for me).
    -v                 Sets the level of verbosity (pass up to 3 times for more verbosity -- e.g. -vvv enables trace
                       logs)
    -V, --version      Prints version information

ARGS:
    <OUTPUT>    Path where we should output the mentat db (defaults to ./mentat_places.db)
    <PLACES>    Path to places.sqlite. If not provided, we'll use the largest places.sqlite in your firefox profiles

CC: @rfk, @linacambridge, @mhammond

@thomcc
Copy link
Copy Markdown
Contributor Author

thomcc commented Aug 21, 2018

This now has a more mentatey schema, which is the combination of what we came up with on the whiteboard and feedback @grigoryk gave me today.

Usage

$ cd places-tool;
$ cargo run --release -- to-mentat -fr

This will use the largest places.sqlite in your profiles, and will output mentat_places.db. There are more arguments (documented above) if you want/need to specify the output or input explicitly.

The -f overwrites mentat_places.db if it already exists, and -r is a more realistic workload, avoiding doing everything in a single-transaction (and doing one transaction per place instead), which has an impact on final database size.

You can insert things more quickly by omitting the -r flag. Actually, with the new schema the difference in size is only around 10% (with the old schema it was around 30%). I don't really understand where the difference in size between these comes from, so this is fairly surprising.

Here's the output of sqlite3_analyzer for my places database run through this tool: https://gist.github.com/thomcc/9edbe535ca52d7e91ac3d196f1e9c3e7. (Note that the binary of sqlite3_analyzer provided on the sqlite website points to an old version of Tcl.framework on mac, and needs to be updated, which can be done with install_name_tool -change /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl /System/Library/Frameworks/Tcl.framework/Versions/8.5/Tcl path/to/sqlite3_analyzer

Producing an anonymized DB

$ cargo run --release -- anonymize -f
$ cargo run --release -- to-mentat -fr mentat_places_anon.db places_anonymized.sqlite

The first command produces places_anonymized.sqlite (again, using the largest places.sqlite in your firefox profiles as input by default), which is your places.sqlite but with all strings replaced with random alphanumeric ascii strings of the same length. The second command produces mentat_places_anon.db from it.

@thomcc
Copy link
Copy Markdown
Contributor Author

thomcc commented Aug 23, 2018

Whoops, that last commit shouldn't have been to this repo.

In either case, due to issues around build config (specifically, it being in this repo requires it use sqlcipher, which makes it harder to build), we've moved this to https://github.com/thomcc/places-tool instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant