[WIP] sklearn API for Gensim models#1462
Merged
menshikh-iv merged 36 commits intoAug 18, 2017
Merged
Conversation
Contributor
|
Are you sure there is no memory duplication for doc2vec numpy arrays? can you please run it through a memory profiler. |
tmylk
reviewed
Jul 12, 2017
| if self.gensim_model is None: | ||
| raise NotFittedError("This model has not been fitted yet. Call 'fit' with appropriate arguments before using this method.") | ||
|
|
||
| # The input as array of array |
Contributor
There was a problem hiding this comment.
Please call them python lists
Contributor
|
Let's aim to merge it this week. The missing things are ipynb and transform tests. |
menshikh-iv
reviewed
Aug 10, 2017
| self.assertEqual(matrix.shape[0], 1) | ||
| self.assertEqual(matrix.shape[1], self.model.size) | ||
|
|
||
| def testSetGetParams(self): |
Contributor
There was a problem hiding this comment.
please add checking with the original model too for each "getset" test (same as previous PR)
…into skl_api_gensim
Contributor
|
Great @chinmayapancholi13💯 |
Merged
fabriciorsf
pushed a commit
to LINE-PESC/gensim
that referenced
this pull request
Aug 23, 2017
* created sklearn wrapper for Doc2Vec * PEP8 fix * added 'transform' function and refactored code * updated d2v skl api code * added unittests for sklearn api for d2v model * fixed flake8 errors * added skl api class for Text2Bow model * updated docstring for d2vmodel api * updated text2bow skl api code * added unittests for text2bow skl api class * updated 'testPipeline' and 'testTransform' for text2bow * added 'tokenizer' param to text2bow skl api * updated unittests for text2bow * removed get_params and set_params functions from existing classes * added tfidf api class * added unittests for tfidf api class * flake8 fixes * added skl api for hdpmodel * added unittests for hdp model api class * flake8 fixes * updated hdp api class * added 'testPartialFit' and 'testPipeline' tests for hdp api class * flake8 fixes * added skl API class for phrases * added unit tests for phrases API class * flake8 fixes * added 'testPartialFit' function for 'TestPhrasesTransformer' * updated 'testPipeline' function for 'TestText2BowTransformer' * updated code for transform function for HDP transformer * updated tests as discussed in PR 1473 * added examples for new models in ipynb * unpinned sklearn version for running unit-tests * updated 'Pipeline' initialization format * updated 'Pipeline' initialization format in ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR creates scikit-learn API for the following Gensim models:
The implementation for the following models is still left: