Fix D2VTransformer.fit_transform. Fix #1834#1845
Conversation
|
Thanks @Utkarsh-Mishra-CIC, @chinmayapancholi13 can you comment this? P/S @Utkarsh-Mishra-CIC please merge fresh |
D2VTransformer.fit_transform. Fix #1834
| Fit the model according to the given training data. | ||
| Calls gensim.models.Doc2Vec | ||
| """ | ||
| d2v_sentences = [doc2vec.TaggedDocument(words[0], [i]) for i, words in enumerate(X)] |
There was a problem hiding this comment.
- Can you check X, this can be already in
TaggedDocumentformat (no need to convert it directly) - Why
words[0]? IfXis iterable of the list of tokens,words[0]will be one token only. - Please add tests for
fit_transform
There was a problem hiding this comment.
- I can check for X like following:
all(isinstance(word, doc2vec.TaggedDocument) for word in X)but that isn't useful forfit_transformas thetransformmethod ofD2VTransformerrequires only a list of list, so the input format should only be list of documents liketrain_inputnot inTaggedDocumentformat. - Thanks for pointing that out, i'll add that in next commit.
- Added in the latest commit.
|
Hi @menshikh-iv! |
|
Hey @chinmayapancholi13, can you review current approach? |
|
ping @Utkarsh-Mishra-CIC, can you answer my questions #1845 (comment)? |
|
Hey @menshikh-iv I went through the discussion on issue #1834 as well as the comments on this PR. When I had worked on sklearn API, I remember that at the time we had decided to not implement |
| Fit the model according to the given training data. | ||
| Calls gensim.models.Doc2Vec | ||
| """ | ||
| if all(isinstance(word, doc2vec.TaggedDocument) for word in X): |
There was a problem hiding this comment.
Check only first element (that's enough).
| self.assertEqual(matrix.shape[1], self.model.size) | ||
|
|
||
| def testFitTransform(self): | ||
| numpy.random.seed(0) |
There was a problem hiding this comment.
This is global seeding (affect on interpreter state, not only for this test), please don't use it.
| numpy.random.seed(0) | ||
| model = D2VTransformer(min_count=1) | ||
|
|
||
| #fit and transform multiple documents |
There was a problem hiding this comment.
PEP8: should be # (space between # and text), here and later.
|
Made the requested changes. |
|
Thanks @Utkarsh-Mishra-CIC, congratz with the first contribution: 1st_place_medal:! |
* Fix D2VTransformer.fit_transform\(piskvorky#1834\) * Add check for TaggedDocument * Add test for D2VTransformer fit_transform * Add test and check d2vtransformer
No description provided.