Reduce memory consumption of summarizer#2298
Conversation
|
I need #2263 to add some tests about graph. |
|
#2263 merged, feel free to continue |
menshikh-iv
left a comment
There was a problem hiding this comment.
Thanks for PR @horpto, can you also please update your first comment in PR with
- what was a reason for these changes (links to complaints about RAM)
- little benchmark (time & RAM)
3.6.0vssummarization-refactoringto better understand, how this improved
| scores = [] | ||
| for index in range(self.corpus_size): | ||
| score = self.get_score(document, index) | ||
| if score > 0: |
There was a problem hiding this comment.
In that case len(scores) <= self.corpus_size, why?
There was a problem hiding this comment.
Because it's actually quite sparse array, isn't it?
In summarizer._set_graph_edge_weights such documents with little weight will be dropped anyway, so there is no reason to waste extra memory. And what's more, if we are needed a dense array we can uncompactify this bow.
There was a problem hiding this comment.
I don't get, how we understand ids of documents that have 0 scores in that case?
There was a problem hiding this comment.
Easy. Like words with 0 weight in bow. We have ids of docs with not zero weight. They are saved in bag-of-docs (I should rename function name part from bow - bag-of-weights to bod - bag-of-docs). If doc id isn't in bag-of-docs, so weight of doc is 0.
Awesome result, nice improvement for new release, thanks @horpto 👍 |
By a request of @menshikh-iv: