Conversation
sacred/randomness.py
Outdated
| random.seed(seed) | ||
| if opt.has_numpy: | ||
| if opt.has_numpy and \ | ||
| version.parse(opt.np.__version__) >= version.parse('1.19'): |
There was a problem hiding this comment.
Why did you add this check? The seed should certainly be set for all versions of sacred, not only for the new one.
There was a problem hiding this comment.
Sry, wrong direction. np.random.seed is depreciated / legacy in 1.19.
sacred/randomness.py
Outdated
| ) | ||
| if opt.has_numpy: | ||
| return opt.np.random.RandomState(seed) | ||
| if version.parse(opt.np.__version__) >= version.parse('1.19'): |
There was a problem hiding this comment.
Can you run black to reformat the code? (Here: use double quotes)
Black fixes.
|
The checks (still on an older version of numpy) fail with |
|
I believe the failing checks are not caused by your changes. There is still one thing with numpy 1.19: currently, the default random number generator is not seeded by sacred, but it should be seeded for reproducibility. I don't see a way of setting a global seed for the default RNG in numpy 1.19. Also, I don't know if using |
|
As far as I understand the new api there is no longer a default rng. For the new api you need a rng and you only get it with |
That's right, but I think to use the new random API correctly, the code should generate one seed for the experiment and then use the seed sequence to directly generate the random generators like so: import numpy as np
# Create a new seed sequence. Store its entropy as the seed
seed_sequence = np.random.SeedSequence()
# In get_rng:
rng = np.random.default_rng(seed_sequence.spawn(1)[0])This creates a deterministic sequence of random number generators that is "good" (in the sense that numpy says good RNG states should look like, which is not guaranteed by random integers from SEEDRANGE) and that is not influenced by other libraries (like MongoDB #573 #783). We would have to rework the structure of the random code a little bit, but the code could need some polishing anyway. Another point: If we change @jnphilipp what do you think about this? |
|
As far as I understand the new API and have tested we don't need to call An option to switch is a good idea, we should also display a warning that this a legacy API, which support will end. |
|
I think we have to call import numpy as np
sq = np.random.SeedSequence()
# Without spawn: Creates RNG with same seed:
np.random.default_rng(sq).integers(100)
Out[5]: 90
np.random.default_rng(sq).integers(100)
Out[6]: 90
np.random.default_rng(sq).integers(100)
Out[7]: 90
np.random.default_rng(sq).integers(100)
Out[8]: 90
# With spawn: Creates RNGs with different seeds but in a reproducible way
np.random.default_rng(sq.spawn(1)[0]).integers(100)
Out[9]: 83
np.random.default_rng(sq.spawn(1)[0]).integers(100)
Out[10]: 22
np.random.default_rng(sq.spawn(1)[0]).integers(100)
Out[11]: 92
np.random.default_rng(sq.spawn(1)[0]).integers(100)
Out[12]: 43 |
|
Yeah I think you're right. |
|
Changed the |
|
Right, that would need a little more thinking I believe. The current processing does this:
With this processing, a captured function always gets the same sequence of seeds when called multiple times. For the new random API it could look like this:
The second variant (new API) is way cleaner than the old one, but I have no idea how to combine those two without overcomplicating everything. The only way I can think of right now would be to write a wrapper that behaves similarly to the For the option: I would add an option to |
|
@thequilo if you think this is good, I'll add the settings stuff. |
thequilo
left a comment
There was a problem hiding this comment.
Hmm, I'm not sure what to do exactly here. I initially thought about something like a RngSequence that directly generates rngs. We would then need one class that generates random.Random, one for random.RandomState and one for random.Generator. This would make the code that uses the rng straightforward, and it would be trivial to add a different random generator, e.g., some custom implementation. But it is not so easy to rewrite the code so that it doesn't break anything. The main problem I see here is that the SeedSequence doesn't actually generate a sequence of seeds (as I thought) and that it is not reproducible by its entropy, so generating seeds to pass to "_seed" and store in the config is not trivial with the SeedSequence.
I don't see much advantages of your code over the previous one and it has the side-effect that older experiments are not reproducible because random.Random uses a different rng than np.random.RandomState.
I don't want to bother you too much with this, so to quickly get a running version of sacred with support for the new random API, we could go back to the beginning and don't use the SeedSequence. I think I now better understand how the SeedSequence works. Your initial implementation should work fine and generate "good" starting points for the rngs, while in the beginning, I thought it wouldn't.
sacred/randomness.py
Outdated
| def __init__(self, seed=None, min=1, max=1e9): | ||
| self.min = min | ||
| self.max = max | ||
| self.rng = random.Random(seed) |
There was a problem hiding this comment.
This breaks previous experiments. Previously, np.random.RandomState was used to create seeds which uses a different random number generator than random.Random, so with this change older experiments would not be reproducible
sacred/randomness.py
Outdated
| return opt.np.random.RandomState(seed) | ||
| try: | ||
| seed_sequence = opt.np.random.SeedSequence(seed) | ||
| return opt.np.random.default_rng(seed_sequence.spawn(1)[0]) |
There was a problem hiding this comment.
Here, you can simply pass the seed_sequence to np.random.default_rng because it is only used for this one rng.
sacred/randomness.py
Outdated
| if opt.has_numpy: | ||
| opt.np.random.seed(seed) | ||
| try: | ||
| opt.np.random.seed(seed) |
There was a problem hiding this comment.
When the legacy random API gets removed.
|
I first started by writing something like a I noticed that I had another look at the I'll revert and add the settings stuff. |
sacred/randomness.py
Outdated
|
|
||
|
|
||
| def create_rnd(seed): | ||
| def create_rnd(seed, np_legacy=False): |
There was a problem hiding this comment.
You can directly access the setting from SETTINGS.CONFIG.NUMPY_RANDOM_LEGACY_API here. Is there a reason for the argument? If I saw this correctly, it is always set to SETTINGS.CONFIG.NUMPY_RANDOM_LEGACY_API. Or do you want to allow setting the API version independently for every captured function?
sacred/settings.py
Outdated
| # if true uses the numpy legacy API, i.e. _rnd in captured functions is | ||
| # a numpy.random.RandomState rather than numpy.random.Generator. | ||
| # numpy.random.RandomState became legacy with numpy v1.19. | ||
| "NUMPY_RANDOM_LEGACY_API": False, |
|
Okay, now I fully understand why you implemented it as it is. It makes sense that you didn't go for the I think now the only thing that's missing is a test case for the new setting. |
|
Something is wrong with the tensorflow 1.14 and 2 test cases. There isn't even a console output on azure. This is not caused by your changes, so I'd say this is fine. There is only one thing left that I'm unsure about: Currently, |
|
About the test in general: Python 3.8 and 3.9 are not tested and 3.5 has no longer support and 3.6 has its EOL in December. Also currently supported Tensorflow version are 1.15, 2.0, 2.1, 2.2, 2.3 and 2.4. I think we should change the tests accordingly. |
|
Yes, fixing these test issues is on my to-do list, but I didn't have time to fix them yet. They got neglected over the past year or so because nobody actively maintained sacred. If you want to, you can adjust the test to support the newer python versions and remove unsupported versions of tensorflow, but better in a separate PR. Removing Python 3.5 has the advantage that we can use f-strings. |
|
K, will do. Merge this first though? |
Update randomness to respect new numpy.random API.