Fix train error of ConcatenatedDoc2Vec in the notebook of doc2vec-IMDB#1377
Conversation
…into fix-word2vec-notebook Conflicts: gensim/test/test_word2vec.py
| " if not isinstance(train_model, ConcatenatedDoc2Vec):\n", | ||
| " train_model.train(doc_list, total_examples=train_model.corpus_count, epochs=train_model.iter)\n", | ||
| " else:\n", | ||
| " train_model.train(doc_list)\n", |
There was a problem hiding this comment.
A simpler and more robust fix would be to change the ConcatenatedDoc2Vec class, in test_doc2vec.py, to make its (no-op) train() match the new train() parameters-signature.
There was a problem hiding this comment.
Yes, you are right. If the train() method is modified, the total_examples and epochs should be provided. But the ConcatenatedDoc2Vec class has no attribute 'corpus_count'.
There was a problem hiding this comment.
The call doesn't have to use train_model.corpus_count from inside the model - it can just use len(doc_list). And since the outside loop is handling the multiple passes, the epochs argument should be 1.
|
@robotcator Please merge |
…into fix-notebook
| def train(self, sentences, total_examples=None, total_words=None, | ||
| epochs=None, start_alpha=None, end_alpha=None, | ||
| word_count=0, | ||
| queue_factor=2, report_delay=1.0): |
There was a problem hiding this comment.
Because this is a no-op implementation that ignores any arguments, it can be specified even more compactly and generically, using Python's syntax for arbitrary positional/keyword arguments. EG:
def train(self, *ignored, **kwignored):
|
@robotcator I open this notebook and get error |
|
@menshikh-iv all the cells were re-ran and the error was fixed. |
| "*0.193200 : 1 passes : Doc2Vec(dbow,d100,n5,mc2,s0.001,t4)_inferred 35.3s 48.3s\n", | ||
| "*0.268640 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4) 48.6s 48.5s\n", | ||
| "*0.208000 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4)_inferred 48.6s 47.4s\n", | ||
| "*0.216160 : 1 passes : dbow+dmm 0.0s 168.9s\n", |
There was a problem hiding this comment.
The change in reported elapsed-time for these concatenated-model results, here and below, is so large it deserves a closer look. I'm no aware of any change, in this PR or prior related work, that could account for such changes – from <2.0 seconds to >170 seconds. It might be a spurious report, or something amiss.
There was a problem hiding this comment.
yes, the concatenated-model train method didn't do anything actually. I will check what' going on this.
|
@robotcator What's a status of this PR? |
|
Ping @robotcator |
|
@menshikh-iv sorry for late reply, I didn't receive last notification. This PR seems works fine and all the cell have been rerun. The warning of notebook is gone. It's able to be merged . |
|
There's still the suspiciously-different reported-runtimes on some of the log lines. |
|
In fact since the last time the notebook was run and its output committed, there have been many changes (outside this PR). So many things could cause the discrepancy; I just suggest the big runtime difference be understood alongside the updated notebook cells. |
|
That's indeed very worrying. @menshikh-iv can you find which commit / PR caused this slowdown? |
|
is there any possible that the performance of computer which results in the difference of output. Because I rerun that notebook twice and the second seems alright. |
|
@piskvorky accepted. I will search and fix if needed. |
No description provided.