Skip to content

Fuzzy matching with supplier report #5

@christophertull

Description

@christophertull

We would like to match the OWRS files (more specifically the dataframe of the OWRS utilities df_OWRS) with the utilities in the supplier_status_table. The names will probably not match exactly because of issues like spacing, plurals ("utility" vs. "utilities") and word order ("City of X" vs "X City of").

Because of this, we will need to do some sort of fuzzy matching to join the files. e.g. first pass can be regexes and stuff, but might need something fancier like string similarity. Can be in R to match the main body of the analysis but could also be in python since this step is sort of stand-alone. If in python could utilize the fuzzywuzzy package, not sure about comparable packages in R.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions