Fix train error of ConcatenatedDoc2Vec in the notebook of doc2vec-IMDB by robotcator · Pull Request #1377 · piskvorky/gensim

robotcator · 2017-06-01T02:04:28Z

No description provided.

…otebook

…into fix-word2vec-notebook Conflicts: gensim/test/test_word2vec.py

…into develop

gojomo · 2017-06-01T02:17:23Z

+    "            if not isinstance(train_model, ConcatenatedDoc2Vec):\n",
+    "              train_model.train(doc_list, total_examples=train_model.corpus_count, epochs=train_model.iter)\n",
+    "            else:\n",
+    "              train_model.train(doc_list)\n",


A simpler and more robust fix would be to change the ConcatenatedDoc2Vec class, in test_doc2vec.py, to make its (no-op) train() match the new train() parameters-signature.

Yes, you are right. If the train() method is modified, the total_examples and epochs should be provided. But the ConcatenatedDoc2Vec class has no attribute 'corpus_count'.

The call doesn't have to use train_model.corpus_count from inside the model - it can just use len(doc_list). And since the outside loop is handling the multiple passes, the epochs argument should be 1.

…into fix-notebook

menshikh-iv · 2017-06-04T18:06:09Z

@robotcator Please merge develop to your branch (missing tensorflow in Travis config).

…into fix-notebook

gojomo · 2017-06-02T04:12:17Z

+    def train(self, sentences, total_examples=None, total_words=None,
+              epochs=None, start_alpha=None, end_alpha=None,
+              word_count=0,
+              queue_factor=2, report_delay=1.0):


Because this is a no-op implementation that ignores any arguments, it can be specified even more compactly and generically, using Python's syntax for arbitrary positional/keyword arguments. EG:

def train(self, *ignored, **kwignored):

…into fix-notebook

menshikh-iv · 2017-06-05T16:19:26Z

@robotcator I open this notebook and get error Notebook validation failed: u'execution_count' is a required property: ..., can you please re-run all cells and commit it (this should help)

…into fix-notebook

robotcator · 2017-06-07T07:15:08Z

@menshikh-iv all the cells were re-ran and the error was fixed.

gojomo · 2017-06-09T01:55:16Z

+      "*0.193200 : 1 passes : Doc2Vec(dbow,d100,n5,mc2,s0.001,t4)_inferred 35.3s 48.3s\n",
+      "*0.268640 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4) 48.6s 48.5s\n",
+      "*0.208000 : 1 passes : Doc2Vec(dm/m,d100,n5,w10,mc2,s0.001,t4)_inferred 48.6s 47.4s\n",
+      "*0.216160 : 1 passes : dbow+dmm 0.0s 168.9s\n",


The change in reported elapsed-time for these concatenated-model results, here and below, is so large it deserves a closer look. I'm no aware of any change, in this PR or prior related work, that could account for such changes – from <2.0 seconds to >170 seconds. It might be a spurious report, or something amiss.

yes, the concatenated-model train method didn't do anything actually. I will check what' going on this.

menshikh-iv · 2017-06-22T08:52:26Z

@robotcator What's a status of this PR?

menshikh-iv · 2017-07-06T06:33:34Z

Ping @robotcator

robotcator · 2017-07-07T02:48:58Z

@menshikh-iv sorry for late reply, I didn't receive last notification. This PR seems works fine and all the cell have been rerun. The warning of notebook is gone. It's able to be merged .

gojomo · 2017-07-07T10:51:35Z

There's still the suspiciously-different reported-runtimes on some of the log lines.

menshikh-iv · 2017-07-07T11:12:02Z

@gojomo As I see, we have only two code change here, and here. Another change from notebook re-run. I don't see any suspicious things in code changes, although elapsed time looks strange.

gojomo · 2017-07-08T03:03:00Z

In fact since the last time the notebook was run and its output committed, there have been many changes (outside this PR). So many things could cause the discrepancy; I just suggest the big runtime difference be understood alongside the updated notebook cells.

piskvorky · 2017-07-08T15:58:28Z

That's indeed very worrying. @menshikh-iv can you find which commit / PR caused this slowdown?

robotcator · 2017-07-09T03:02:02Z

is there any possible that the performance of computer which results in the difference of output. Because I rerun that notebook twice and the second seems alright.

menshikh-iv · 2017-07-09T04:47:36Z

@piskvorky accepted. I will search and fix if needed.

robotcator and others added 18 commits March 17, 2017 22:53

fix the compatibility between python2 & 3

1aa3f33

Merge https://github.com/RaRe-Technologies/gensim into fix-word2vec-n…

24e6331

…otebook

require explicit corpus size, epochs for train()

f6f571f

make all train() calls use explicit count, epochs

5e9529b

add tests to make sure that ValueError is indeed thrown

5c24a90

update test

c89f285

fix the word2vec's reset_from()

10ff8a5

Merge branch 'fix-word2vec' into fix-word2vec-notebook

a6312ca

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

be5216a

…into fix-word2vec-notebook Conflicts: gensim/test/test_word2vec.py

require explicit corpus size, epochs for train()

504bd09

make all train() calls use explicit count, epochs

43f9689

update notebooks

49e3d00

fix some error

c9eab32

fix test error

8024eb5

Merge branch 'test-word2vec' of https://github.com/robotcator/gensim …

d3562b6

…into develop

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

ff93cdf

…into develop

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

67f0367

…into develop

fix the train error of ConcatenatedDoc2Vec

8a6098a

gojomo reviewed Jun 1, 2017

View reviewed changes

robotcator added 2 commits June 2, 2017 09:37

update the ConcatenatedDoc2Vec class

04cf9cd

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

09a2691

…into fix-notebook

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

623add0

…into fix-notebook

gojomo reviewed Jun 5, 2017

View reviewed changes

robotcator added 2 commits June 5, 2017 23:14

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

b365e2a

…into fix-notebook

update the parameters

2e15945

robotcator added 2 commits June 6, 2017 23:12

rerun all the cells

5306c0a

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

2aaecff

…into fix-notebook

gojomo reviewed Jun 9, 2017

View reviewed changes

menshikh-iv merged commit 3e38e33 into piskvorky:develop Jul 7, 2017

Uh oh!

Conversation

robotcator commented Jun 1, 2017

Uh oh!

gojomo Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

robotcator Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

gojomo Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv commented Jun 4, 2017

Uh oh!

gojomo Jun 2, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv commented Jun 5, 2017

Uh oh!

robotcator commented Jun 7, 2017

Uh oh!

gojomo Jun 9, 2017

Choose a reason for hiding this comment

Uh oh!

robotcator Jun 11, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv commented Jun 22, 2017

Uh oh!

menshikh-iv commented Jul 6, 2017

Uh oh!

robotcator commented Jul 7, 2017

Uh oh!

gojomo commented Jul 7, 2017

Uh oh!

menshikh-iv commented Jul 7, 2017

Uh oh!

gojomo commented Jul 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piskvorky commented Jul 8, 2017

Uh oh!

robotcator commented Jul 9, 2017

Uh oh!

menshikh-iv commented Jul 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gojomo commented Jul 8, 2017 •

edited

Loading