[WIP] Added sklearn wrapper for LDASeq model by chinmayapancholi13 · Pull Request #1405 · piskvorky/gensim

chinmayapancholi13 · 2017-06-09T09:28:08Z

This PR adds a scikit-learn wrapper for Gensim's LDASeq model.

menshikh-iv · 2017-06-13T15:04:59Z

+        """
+        Sklearn wrapper for LdaSeq model. Class derived from gensim.models.LdaSeqModel
+        """
+        self.corpus = None


Why you needed a field for a corpus?

@menshikh-iv In my opinion, the user might be interested to know about the corpus used for training the model (using the get_params function). Should we continue to store this value?

@chinmayapancholi13 No, sklearn does not store X, so we should not

@menshikh-iv Yes, that is true for sklearn. Removing corpus attribute from all the wrappers then.

menshikh-iv · 2017-06-13T15:05:34Z

+        Sklearn wrapper for LdaSeq model. Class derived from gensim.models.LdaSeqModel
+        """
+        self.corpus = None
+        self.model = None


Please do this field "private" (start with underscores)

menshikh-iv · 2017-06-13T15:06:15Z

+                initialize='gensim', sstats=None,  lda_model=None, obs_variance=0.5, chain_variance=0.005, passes=10,
+                random_state=None, lda_inference_max_iter=25, em_min_iter=6, em_max_iter=20, chunksize=100)
+        """
+        self.corpus = X


Don't need to save X.

menshikh-iv · 2017-06-13T15:09:05Z

+        """
+        Fit the model according to the given training data.
+        Calls gensim.models.LdaSeqModel:
+        >>> gensim.models.LdaSeqModel(corpus=None, time_slice=None, id2word=None, alphas=0.01, num_topics=10,


Please remove this block >>> ... , this example does not help for a new user.

@menshikh-iv Should we remove this >>> .... statement in all the model wrappers? This line basically tells us how the associated Gensim model is actually called.

You just need to specify the class that is used (you have already done above) and write where a user can read the documentation.

menshikh-iv · 2017-06-16T03:47:05Z

+            em_min_iter=self.em_min_iter, em_max_iter=self.em_max_iter, chunksize=self.chunksize)
+        return self
+
+    def transform(self, docs):


Chek case, when you create instance and call transform immediately (without fit), you need to raise exception like sklearn

Also, please add an example of docs param in docstring.

@menshikh-iv For checking if the model has been fitted, would it be a good idea to check if self.gensim_model is None or not? This approach would clearly give an error when fit hasn't been called before calling transform but this also allows the user to set the value of self.gensim_model through set_params function (or even as wrapper.gensim_model=...) and then call transform function, which makes sense for us to allow.

I completely forgot about set_param, so, I think if you disable gensim_model in set_param, you can check model is None (it does not cover all cases, but covers the most obvious)

Could you elaborate the meaning of "disabling" gensim_model param from the function set_params?
Actually, gensim_model is a public attribute of the model so it can be set like ldaseq_wrapper.gensim_model = some_model, which is almost the same as using set_params function to set this value. So, checking whether self.gensim_model is None should be enough, right?
This would be like :

def transform(self, docs): """ Return the topic proportions for the documents passed. """ if self.gensim_model is None: raise NotFittedError("This model has not been fitted yet. Call 'fit' with appropriate arguments before using this method.") # The input as array of array check = lambda x: [x] if isinstance(x[0], tuple) else x .......................................................................... .......................................................................... .......................................................................... ..........................................................................

Ok, as a temporary option.

menshikh-iv · 2017-06-16T03:48:54Z

+        return np.reshape(np.array(X), (len(docs), self.num_topics))
+
+    def partial_fit(self, X):
+        raise NotImplementedError("'partial_fit' has not been implemented for the LDA Seq model")


LDA Seq model -> SklLdaSeqModel

menshikh-iv · 2017-06-16T03:49:50Z

+        for key in param_dict.keys():
+            self.assertEqual(model_params[key], param_dict[key])
+
+


Add persistence test with pickle

And add test with pipeline

menshikh-iv · 2017-06-19T08:31:44Z

+        score = text_ldaseq.score(corpus, test_target)
+        self.assertGreater(score, 0.50)
+
+    def testPersistence(self):


It's sanity check only.
For persistence, you need to compare current and loaded models. For this purpose, you need to compare current and loaded inner matrices OR get corpus, transform it with both variant and compare results

Thanks. I have now added code for comparing the vectors transformed from original and loaded models, in addition to this sanity check. :)

menshikh-iv · 2017-06-19T08:32:55Z

+        text_ldaseq = Pipeline((('features', model,), ('classifier', clf)))
+        text_ldaseq.fit(corpus, test_target)
+        score = text_ldaseq.score(corpus, test_target)
+        self.assertGreater(score, 0.50)


It's will be correct every time? No needed to fix seeds for reproducibility?

We now have a fixed seed which is set before the test testPipeline to ensure that we get similar values.

menshikh-iv · 2017-06-20T17:19:27Z

Thank you @chinmayapancholi13 👍

chinmayapancholi13 added 7 commits June 9, 2017 02:25

added new file for LDASeq model's sklearn wrapper

73cd770

PEP8 changes

4744c7b

added 'transform' and 'partial_fit' methods

d79f125

added unit_tests for ldaseq model

07efa33

PEP8 changes

d73838e

PEP8 changes

6e57c5f

refactored code acc. to composite design pattern

c969c8b

menshikh-iv suggested changes Jun 13, 2017

View reviewed changes

This was referenced Jun 13, 2017

[MRG] Added sklearn wrapper for AuthorTopic model #1403

Merged

[WIP] Changes in sklearn wrappers for LDA and LSI models #1398

Merged

[WIP] Sklearn wrapper for RandomProjections Model #1395

Merged

chinmayapancholi13 added 6 commits June 14, 2017 00:47

refactored wrapper and tests

8b0cced

removed 'self.corpus' attribute

ea9922e

updated 'self.__model' to 'self.gensim_model'

8f88a10

updated 'fit' and 'transform' functions

4f33248

updated 'testTransform' test

8aa6898

updated 'testTransform' test

77a8672

menshikh-iv suggested changes Jun 16, 2017

View reviewed changes

chinmayapancholi13 added 6 commits June 16, 2017 00:02

added 'NotFittedError' in 'transform' function

ad895a2

added 'testPersistence' and 'testModelNotFitted' tests

6f9929a

added description for 'docs' in docstring of 'transform'

05b63e3

added 'testPipeline' test

3452e80

PEP8 change

492fbc6

replaced 'text_lda' variable with 'text_ldaseq'

dec60e1

menshikh-iv reviewed Jun 19, 2017

View reviewed changes

chinmayapancholi13 added 2 commits June 19, 2017 03:10

updated 'testPersistence' test

fd5fc90

set fixed seed in 'testPipeline' test

e041431

menshikh-iv approved these changes Jun 20, 2017

View reviewed changes

menshikh-iv merged commit 477a3a3 into piskvorky:develop Jun 20, 2017

		for key in param_dict.keys():
		self.assertEqual(model_params[key], param_dict[key])

Uh oh!

Conversation

chinmayapancholi13 commented Jun 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chinmayapancholi13 Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chinmayapancholi13 Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

menshikh-iv Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

menshikh-iv Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

menshikh-iv commented Jun 20, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chinmayapancholi13 Jun 14, 2017 •

edited

Loading

chinmayapancholi13 Jun 14, 2017 •

edited

Loading

menshikh-iv Jun 14, 2017 •

edited

Loading

menshikh-iv Jun 16, 2017 •

edited

Loading