-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Labels
Description
We would like to match the OWRS files (more specifically the dataframe of the OWRS utilities df_OWRS) with the utilities in the supplier_status_table. The names will probably not match exactly because of issues like spacing, plurals ("utility" vs. "utilities") and word order ("City of X" vs "X City of").
Because of this, we will need to do some sort of fuzzy matching to join the files. e.g. first pass can be regexes and stuff, but might need something fancier like string similarity. Can be in R to match the main body of the analysis but could also be in python since this step is sort of stand-alone. If in python could utilize the fuzzywuzzy package, not sure about comparable packages in R.
Reactions are currently unavailable