Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Vectorize Question Answering Prediction Head#603

Merged
Timoeller merged 11 commits intomasterfrom
vectorize_qa_ph
Oct 29, 2020
Merged

Vectorize Question Answering Prediction Head#603
Timoeller merged 11 commits intomasterfrom
vectorize_qa_ph

Conversation

@brandenchan
Copy link
Contributor

@brandenchan brandenchan commented Oct 26, 2020

This PR implements a more efficient way of disqualifying invalid start-end spans. Invalid spans are now assigned very low logit scores early on in the modelling pipeline. This is done through vector operations. This will be an improvement over the older method whereby all candidate spans are sorted by their scores and only later ruled out if they are invalid (e.g. end comes before start, either start or end points to padding).

This should also fix #572 where modelling times could vary wildly depending on whether the question is relevant or irrelevant.

  • Show improvement in per component benchmark
  • Make sure performance has not significantly changed
  • Where to save per component benchmark
  • Update Haystack benchmark

@brandenchan
Copy link
Contributor Author

Master (passages per second)

deepset/bert-base-cased-squad2 - irrelevant - 19.260504316273845
deepset/bert-base-cased-squad2 -  relevant  - 87.872978351196
deepset/minilm-uncased-squad2  - irrelevant - 26.81841741284399
deepset/minilm-uncased-squad2  -  relevant  - 120.737699681037

This branch (passages per second)

deepset/bert-base-cased-squad2 - irrelevant - 87.9419574492753
deepset/bert-base-cased-squad2 -  relevant  - 88.23084025443052
deepset/minilm-uncased-squad2  - irrelevant - 121.63358417290499
deepset/minilm-uncased-squad2  -  relevant  - 123.58026447916457

@brandenchan
Copy link
Contributor Author

brandenchan commented Oct 26, 2020

Current squad dev performance with deepset/roberta-base-squad2 (evaluated with official script)

  "exact": 76.69502231954856,
  "f1": 80.03626318194439,
  "total": 11873,
  "HasAns_exact": 67.57759784075573,
  "HasAns_f1": 74.26966139663054,
  "HasAns_total": 5928,
  "NoAns_exact": 85.78637510513036,
  "NoAns_f1": 85.78637510513036,
  "NoAns_total": 5945

cf. numbers reported in model card

"exact": 78.49743114629833,
"f1": 81.73092721240889

The performance is somewhat worse than it should be but could be related to #552 and #602

@brandenchan brandenchan requested review from Timoeller and tholor and removed request for Timoeller October 26, 2020 15:24
@brandenchan brandenchan changed the title Vectorize Question Answering Prediction Head WIP: Vectorize Question Answering Prediction Head Oct 26, 2020
@brandenchan
Copy link
Contributor Author

Ran test/benchmark/question_answering_benchmarks.py. There is no signficant difference in performance or speed between this branch and master

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful vectorization. Ready to merge from my side.

@Timoeller Timoeller changed the title WIP: Vectorize Question Answering Prediction Head Vectorize Question Answering Prediction Head Oct 29, 2020
@Timoeller Timoeller merged commit 78fb5cf into master Oct 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve QA logit checks

2 participants