Fix `DenseRetrievalExactSearch` evaluation by NouamaneTazi · Pull Request #154 · beir-cellar/beir

NouamaneTazi · 2023-08-12T17:21:45Z

I noticed there was a problem in the way we handled queries that exist in the retrieval corpus. By default we have ignore_identical_ids=True which pops these duplicated queries from the results. Which means some queries would have top_k retrieved documents, while others have top_k-1 retrieved documents.

Fixing this behaviour gives a noticeable change in scores. Here's the difference in scores noticed for "intfloat/e5-large" on ArguAna evaluated using MTEB:

    model = SentenceTransformer("intfloat/e5-large", device="cuda")
    eval = MTEB(tasks=["ArguAna"])
    eval.run(model, batch_size=512*2, corpus_chunk_size=10000, overwrite_results=True)

Scores before fix:

INFO:mteb.evaluation.MTEB:Scores: {'ndcg_at_1': 0.27596, 'ndcg_at_3': 0.42701, 'ndcg_at_5': 0.48151, 'ndcg_at_10': 0.53452, 'ndcg_at_100': 0.57081, 'ndcg_at_1000': 0.57226, 'map_at_1': 0.27596, 'map_at_3': 0.38976, 'map_at_5': 0.41967, 'map_at_10': 0.44187, 'map_at_100': 0.4507, 'map_at_1000': 0.45077, 'recall_at_1': 0.27596, 'recall_at_3': 0.53485, 'recall_at_5': 0.66856, 'recall_at_10': 0.83073, 'recall_at_100': 0.98578, 'recall_at_1000': 0.99644, 'precision_at_1': 0.27596, 'precision_at_3': 0.17828, 'precision_at_5': 0.13371, 'precision_at_10': 0.08307, 'precision_at_100': 0.00986, 'precision_at_1000': 0.001, 'mrr_at_1': 0.28378, 'mrr_at_3': 0.39284, 'mrr_at_5': 0.42261, 'mrr_at_10': 0.44498, 'mrr_at_100': 0.45374, 'mrr_at_1000': 0.45381, 'evaluation_time': 127.59}

Scores after fix:

INFO:mteb.evaluation.MTEB:Scores: {'ndcg_at_1': 0.41963, 'ndcg_at_3': 0.57859, 'ndcg_at_5': 0.62677, 'ndcg_at_10': 0.65648, 'ndcg_at_100': 0.67739, 'ndcg_at_1000': 0.67846, 'map_at_1': 0.41963, 'map_at_3': 0.53983, 'map_at_5': 0.56664, 'map_at_10': 0.57907, 'map_at_100': 0.58407, 'map_at_1000': 0.58413, 'recall_at_1': 0.41963, 'recall_at_3': 0.69061, 'recall_at_5': 0.80725, 'recall_at_10': 0.89829, 'recall_at_100': 0.98862, 'recall_at_1000': 0.99644, 'precision_at_1': 0.41963, 'precision_at_3': 0.2302, 'precision_at_5': 0.16145, 'precision_at_10': 0.08983, 'precision_at_100': 0.00989, 'precision_at_1000': 0.001, 'mrr_at_1': 0.41963, 'mrr_at_3': 0.53983, 'mrr_at_5': 0.56664, 'mrr_at_10': 0.57907, 'mrr_at_100': 0.58407, 'mrr_at_1000': 0.58413, 'evaluation_time': 112.69}

cc @thakur-nandan

Muennighoff

I'm not fully understanding yet, maybe you can help me out 😅🧐

Muennighoff · 2023-08-15T21:52:42Z

beir/retrieval/search/dense/exact_search.py

        corpus_ids = sorted(corpus, key=lambda k: len(corpus[k].get("title", "") + corpus[k].get("text", "")), reverse=True)
+        if ignore_identical_ids:
+            # We remove the query from results if it exists in corpus
+            corpus_ids = [cid for cid in corpus_ids if cid not in query_ids] 


Doesn't this make the task "easier" by removing all other queries as options for each query?

I.e. previously, given query1 the model could wrongly retrieve query2 (if it was also in the corpus).
Now the model cannot retrieve any of the other queries which makes it easier assuming the answer is never another query.

I think thus option was for Quora: You want to find paraphrases of queries, but not the original start query. But this original query will always be ranked first at it is also part of the corpus

Which is why we have the ignore_identical_ids option I think. This PR only tries to fix ignore_identical_ids=True case

Muennighoff · 2023-08-15T21:56:49Z

beir/retrieval/search/dense/exact_search.py

-            cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k+1, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)
+            cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)


You write that Which means some queries would have top_k retrieved documents, while others have top_k-1 retrieved documents., but didn't this +1 ensure that that does not happen cuz we retrieve top_k+1 but then only allow top_k lateron?

IIUC, the problem comes from this line

beir/beir/retrieval/search/dense/exact_search.py

Line 86 in 505d80d

if len(result_heaps[query_id]) < top_k:

So we only keep the top_k (which sometimes include the query inside the retrieved docs)

I see, I thought the if corpus_id != query_id: would ensure that the query would never be added to result_heaps[query_id] 🧐

Hmm, then why do we get different results? 🧐

It's easy to check, we just have to assert that number of results of each query is top_k. Can you check that please @Muennighoff ?

add ignore_identical_ids to DRES

fc17f4a

NouamaneTazi marked this pull request as ready for review August 12, 2023 17:29

.

505d80d

Muennighoff reviewed Aug 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `DenseRetrievalExactSearch` evaluation#154

Fix `DenseRetrievalExactSearch` evaluation#154
NouamaneTazi wants to merge 2 commits intobeir-cellar:mainfrom
NouamaneTazi:nouamane/fix-exact-search

NouamaneTazi commented Aug 12, 2023 •

edited

Loading

Uh oh!

Muennighoff left a comment

Uh oh!

Muennighoff Aug 15, 2023

Uh oh!

nreimers Aug 16, 2023

Uh oh!

NouamaneTazi Aug 16, 2023

Uh oh!

Muennighoff Aug 15, 2023

Uh oh!

NouamaneTazi Aug 16, 2023

Uh oh!

Muennighoff Aug 16, 2023

Uh oh!

NouamaneTazi Aug 17, 2023

Uh oh!

NouamaneTazi Aug 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k+1, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)
		cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)

Conversation

NouamaneTazi commented Aug 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Muennighoff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NouamaneTazi commented Aug 12, 2023 •

edited

Loading