[WIP] Fix backward incompatibility due to random_state#1327
Conversation
random_staterandom_state
|
Please add unit tests |
|
@tmylk Yes. Working on adding tests. |
|
@tmylk @menshikh-iv I have added unit tests for ensuring that backward compatibility is not broken due to In the following 2 commits, I also had to add checks And the last commit is failing for Python 3.5 because of the test |
| if self.state is not None: | ||
| self.state.save(utils.smart_extension(fname, '.state'), *args, **kwargs) | ||
|
|
||
| # Save 'random_state' separately |
There was a problem hiding this comment.
what is the purpose of saving it separately?
There was a problem hiding this comment.
@tmylk I think It's better for case "file don't exist" (and more flexible)
| result.random_state = utils.unpickle(random_state_fname) | ||
| else: | ||
| result.id2word = None | ||
| logging.warning("random_state not stored on disk so using default value") |
There was a problem hiding this comment.
if random_state is not stored that means that it is a new version of the model and it is going to be loaded in the main pickle load. Please change the logic
There was a problem hiding this comment.
@tmylk Your suggestion is a bit different from the earlier discussion on the issue. Hence, I want to make sure I understand the desired solution before making the changes.
If I understand it correctly, what you are suggesting is :
- If there is indeed a file with the extension
.random_statepresent on disk, this means that the model was saved using a pre-0.13.2 version of Gensim. So we use this file to setresult.random_stateat the time of loading. - However, if there is no such file present on disk, then this means that the model was saved using a post-0.13.2 version of Gensim and thus
result.random_stategot set at the time of the main pickle load. So in this case, we don't need to do anything else.
But for models saved using a pre-0.13.2 version of Gensim, there was no .random_state file created at the time of saving the model. So while loading such a model from disk, where would the .random_state file come from in this case? Is the user responsible for creating this file explicitly in such a case? If this is true, then I believe we don't need to make any changes in the save function for LdaModel at all and we just need to change the load function.
Please correct me if I am wrong or missing something here. Otherwise, if this is indeed what we need, I could go ahead and make the appropriate changes.
There was a problem hiding this comment.
Please add a check that this indeed happened: thus result.random_state got set at the time of the main pickle load.
There was a problem hiding this comment.
@tmylk I believe I have understood the solution that we want. However, I have a minor doubt about where would the .random_state file come from? Is it that the user is responsible for creating it explicitly always and we (i.e. from within the save function) need not create it ever?
If this is true, then in case we are loading a pre-0.13.2 model and no .random_state file exists on disk, then should we set result.random_state using a default value like get_random_state(None) with a logger.warning?
There was a problem hiding this comment.
@tmylk Could you please respond to this query?
There was a problem hiding this comment.
Is it that the user is responsible for creating it explicitly always
I think the user should not think about additional files, he saves the whole model
If this is true, then in case we are loading a pre-0.13.2 model and no .random_state file exists on disk, then should we set result.random_state using a default value like get_random_state(None) with a logger.warning?
I think it's correct
There was a problem hiding this comment.
The simple way to do this is check that random_state was loaded, if not - you set result.random_state using a default value like get_random_state(None) with a logger.warning
There was a problem hiding this comment.
@menshikh-iv Then there is no .random_state file involved at all, correct? To summarize, the solution is :
- First, load the entire model.
- Check if
result.random_statewas set or not. For the newer (post 0.13.2) models, it would have been set through the main pickle load. For the older (pre 0.13.2) models,result.random_statewould not be set through the main pickle load so we setresult.random_statetoget_random_state(None).
And in this solution, we don't need to make any changes in the save function, just the load function.
| self.assertTrue(isinstance(i[1], six.string_types)) | ||
|
|
||
| def testRandomStateBackwardCompatibility(self): | ||
| # load a model saved using a pre-0.13.2 version of Gensim |
There was a problem hiding this comment.
this test is identical to the previous one. just one test is enough that checks all the fields
There was a problem hiding this comment.
Sure. I thought since these were two different issues so we'd want to put separate tests to verify both are resolved. I'll make the update according to your suggestion and remove the earlier test here.
There was a problem hiding this comment.
Better to avoid code duplication
|
could you please test that this exact test fails in the version of the code prior to your changes? |
|
@tmylk The last commit now shows that all checks have passed. I guess it's sorted for now then. |
|
To validate this as a fix we need to see what changed. This is how Test Driven Development works for any issue:
Could you please add these 2 commits to this PR? |
|
@tmylk Got it. Thanks a lot for the comprehensive feedback. :) Edit : I have added the commit having only unit tests and test data, which (as it should) is failing the test which checks |
|
@tmylk @menshikh-iv I have made changes to the code as well. One of the tests is failing for Python 2.7 in the last commit because of Edit : The last commit now passes all the tests. |
…ents1 Added comments explaining logic for changes in PR #1327
This PR fixes issue #1082.