This repository contains the code for our ECIR 2023 accepted work: Towards Effective Paraphrasing for Information Disguise.
If you face any issues, you can contact the author(s) at anmolagarwal4453@gmail.com.
code/beam_search_code/Disguise Text.ipynb: Shows the disguise of a true sentence (query) via our modelcode/beam_search_code/beam_helper: contains all the helper modules for our modelbeam_utils.py: contains the code dealing with single level phrase substitution, Beam Search, Constituency Parse Tree creation etc.synonyms_store.py: contains the code to get synonyms of a term in Counterfitting synonyms vector spacefaiss_fetch.py: Contains the code for initializing DPR and fetching top K relevant documentsperplexity_calculation.py: contains the code initiating the perplexity calculationfetch_use_scores.py: contains the code to create Universal Sentence Encoding for a given piece of text
code/beam_search_code/counter-fitted-vectors.txt: Counterfitting vectors used for fetching synonymsdata/all_syns.json: Contains the 10 nearest neighbours for all terms in the dictionary (the nearest neighbours were calcuated by usingFacebook AI Similarity Search (FAISS)) on the vectors incounter-fitted-vectors.txtsql_lite_dbs/<name>.db: expects the database containing the metadata and contents of the document store (to be used by DPR)code/faiss_indexes/<name>.faiss: expects the vectors for the documents in the document storecode/faiss_indexes/exp_with_two_thou_short.json: expects the configuration file containing the parameters describing how to read ".faiss"
Details of the conda environment for the above codebase is present in adversarial_search.yaml.
We use Haystack's DPR implementation.
| Parameter Name | Description |
| MAX_DEPTH | Number of levels in the beam search tree ie the MAXIMUM number of phrase substitutions allowed to be made in the query |
| ALPHA_VAL |
|
| NUM_PERPLEXITY_NODES_TO_EXPAND |
|
| BeamWidth | Max number of nodes at each level of the beam tree. |
| NUM_FAISS_DOCS_TO_RETRIEVE | Max relevant documents to be fetched for the query in which the source document's presence needs to be checked. |
| SIMILARITY_CUT_OFF_THRESHOLD |
|
The work can be cited as:
@inproceedings{10.1007/978-3-031-28238-6_22,
author = {Agarwal, Anmol and Gupta, Shrey and Bonagiri, Vamshi and Gaur, Manas and Reagle, Joseph and Kumaraguru, Ponnurangam},
title = {Towards Effective Paraphrasing For Information Disguise},
year = {2023},
isbn = {978-3-031-28237-9},
publisher="Springer Nature Switzerland",
address="Cham",
url = {https://doi.org/10.1007/978-3-031-28238-6_22},
doi = {10.1007/978-3-031-28238-6_22},
booktitle = {Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II},
pages = {331–340},
keywords = {Neural information retrieval, Adversarial retrieval, Information disguise, Paraphrasing, Computational ethics},
location = {Dublin, Ireland}
}