-
Notifications
You must be signed in to change notification settings - Fork 396
Description
This issue is a follow-up of #245, #349, #508, #1296, #1381, #1405.
This exposition is based on discussions with NEST developers, especially @jougs and @hakonsbm.
Background
The current NEST random number architecture dates back to 2002 and is thoroughly outdated today. The arrival of carefully standardized RNGs with the C++11 standard, the essentially complete transition to 64-bit CPUs, the availability of dedicated cryptographic RNG hardware (AES) and the transition from 100s to 100s of 1000s of parallel processes create an entirely different landscape for random number generation today.
Key weaknesses include
- a difficult user interface for seeding RNGs, requiring users to supply multiple seeds
- a similar difficulty in selecting other than the default RNG
- use of the Knuth lagged Fibonacci generator as default generator
- difficulty to add modern generator libraries such as Random123.
General considerations
NEST
Use of random numbers
Random numbers are consumed in NEST
- during node creation to randomize node parameters;
- during connection creation
- to select connections to be created,
- to parameterize connections;
- during simulation
- to generate stochastic input (randomized spike trains, noisy currents),
- to support stochastic spike generation.
In all cases except certain sub-cases of 2.i, randomization is tied to a node, which in turn is assigned to a specific virtual process, either because the node is concerned (cases 1, 3) or because the node is the target of a connection (case 2). NEST design guarantees that nodes assigned to different VPs can be updated independent of each other, and the same holds for connections if created or parameterized from a target-node perspective. Therefore, for all these purposes, one random number stream per VP is required and sufficient to ensure that different VPs can operate independently.
The only exception are certain sub-cases of 2.i in which all VPs need to draw identical random number streams and discard all numbers that are irrelevant. This pertains to fixed_outdegree and fixed_total_number (initial step) connection patterns. For these purposes, a global random number stream is required.
Quantity of random numbers required
NEST therefore requires in total N_VP+1 random number streams. With a view to future exascale computers, we need to prepare for N_VP=O(10^7) random streams. We can estimate the number of random numbers required per stream as follows:
- Based on current practice, we assume
N = 1000neurons per stream; this results in a total ofO(10^11)neurons, on the scale of the human brain and thus a sensible upper limit. - Each neuron receives excitatory and inhibitory Poisson background input, i.e., 2 Poisson distributed random numbers per time step per neuron, each of which typically requires multiple uniform random numbers; we assume for simplicity 2 uniform per Poisson number, i.e., 4 plain random numbers per time step per neuron.
- A time step is 0.1 ms and a typical simulation duration 100 s simulated time for 10^6 time steps.
- In total, simulation will consume
~1000 x 4 x 10^6 = 4 x 10^10random numbers per stream (consumption during network construction will usually be negligible). - To allow for a margin of error, we assume O(10^11) random numbers consumed per stream during a single simulation.
Replicability guarantee
A NEST simulation compiled with the same compiler and library, on the same computer hardware and initialised with a fixed random seed, shall generate exactly the same output on each execution.
Exchangeable random number generator
The user shall in an easy way, in particular without recompilation, be able to perform the same simulation with different random number generators. This allows the user to test whether effects observed in simulations may be artefacts of a specific RNG type.
Parallel random number streams
Collision risk
L'Ecuyer et al (2017) provide a recent account of random number generation in highly parallel settings. The main approach to providing independent random number streams in parallel simulations is to use a high-quality random number generator with long period and seed it with a different seed for each stream. They then argue that if we have s streams of length l and a generator with period r, the probability that two streams seeded with different seeds will overlap is given by
p_overlap ~ s^2 l / r .
We have
s = N_VP = 10^7 ~ 2^23
l = 10^11 ~ 2^37
so assuming an rng period of r = 2^128 we arrive at
p_overlap ~ (2^23)^2 x 2^37 / 2^128 = 2^-45 ~ 3 x 10^-14
while for a period of r = 2^256 we obtain
p_overlap ~ 2^-173 ~ 10^-52 .
The probability of stream collisions is thus negligibly small, provided we use random number generators with a period of at least 2^128.
L'Ecuyer (2012, Ch 3.6) points out that certain random number generators classes will show too much regularity in their output if more than r^1/3 numbers are used (this relation is based on exercises in Knuth, TAOCP vol 2, see L'Ecuyer and Simard (2001)). While it is not certain that that analysis also applies to (short) subsequences of the period of an RNG, one might still re-consider the analysis above, but using l^3 ~ 2^111 instead of l when computing p_overlap. We then obtain
Period r | p_overlap
- | -
2^128 ~ 10^38|>> 1
2^256 ~ 10^77|2^-99 ~ 10^-30
2^512 ~ 10^154|2^-355 ~10^-107
Thus, for generators with periods of at least 2^256, we have negligible collision probability also under this tightened criterium.
Available seeds
Traditional seed() functions typically accept a 32-bit integer as input and initialize RNG state from this. This allows only for 4 x 10^9 different seeds, so if a simulation requires 10^7 streams, we can at most perform 400 different simulations. This is not acceptable and generators with reliable, better seed generators are required.
Available values
Many existing RNGs return 32-bit integers, which are converted into [0, 1)-uniform random numbers by simple division. This allows at most 2^32=4 x 10^9 different values even for 53-bit-mantissa doubles (see also boostorg/random#61). Given that our streams consume 10^11 numbers, we would expect every possible value to occur many times over during a simulation, which is not ideal. But keep in mind that while we will see the same value many times, it will occur in a different sequence of numbers every time provide the underlying generator has sufficiently long period.
Random number and deviate generator libraries
The following random number and deviate generator libraries have been considered primarily:
- The GNU Scientific Library Random Number Generation and Random Distribution modules (used in NEST 2)
- The Boost Library Random module
- The C++ standard library pseudo-random number generation component
GSL and Boost
The GSL random number facilities work strictly with 32 bit random numbers and seeds. This strongly limits there utility in view of the requirements listed above.
The Boost Library Random module uses only 32 bits in generating uniform-[0, 1)-distributed numbers, which form the basis of all random deviate generation. Many deviate generators also appear to rely heavily on constant or pre-computed tables (e.g. normal distribution), which seems little cache efficient and may even cause slow memory access if static data is local to a single CPU. Therefore, the Boost Random module also seems of limited value.
C++ Standard Library
The random number and distribution components of the C++ Standard Library are defined in great detail in §29.6 of the C++ Standard; all references to the standard in the following are to the (see C++17 Working Draft N4659, which is openly available).
Currently, three implementations of the C++ standard library are available
- GNU's libstdc++ (
random,bits/random.h,bits/random.tcc - LLVM's libc++ (
random) - Microsoft's STL (
random)
Where the standard leaves aspects to the implementations, checking these three implementations provides comprehensive answers for all practical purposes.
Advantages
- Available out of the box with all C++11 compiler installations
- Clean C++ interface
- Random number generators are precisely defined, including specific requirements on numbers generated for given seeds.
- Well-defined, standardised seed-sequence concept for generating complete state data for generators with large state parameterised by a sequence of 32-bit integers; based on seeding scheme of MT19937 (§29.6.7.1)
- Well-defined, standardised way to convert raw random numbers into uniform-[0, 1)-distributed random numbers in which all mantissa bits are randomised, i.e., 53 bits for
double(generate_canonical(), §29.6.7.2). This is independent of whether the underlying generator is a 32 or 64 bit generator. - Even though it does not appear to be required explicitly by the standard, all
*_distributionclasses in all three implementations inspected consume random numbers only viagenerate_canonical(), i.e., using 53 bits of randomness. - All generators support serialisation and de-serialisation via
operator<<()andoperator>>().
Disadvantages
- The details of numbers drawn from random distributions is left to the implementation (§29.6.8.1-3) as long as the numbers are distributed correctly. Different implementations (libstdc++, libc++) indeed generate different sequences of random numbers for some distributions.
- Fewer RNG types available than in GSL; not particularly problematic, since most of the GSL generators are of more historic interest and do not meet our period requirements.
Other generators
A number of other random number generator libraries are available. They usually provide a small number of raw random number generators and would require adaptation to be usable by random distribution generators. Many of these libraries do not seem to follow modern open source code development best practices (e.g. code in open repositories with systematic bug tracking, code review and automated testing).
- The Random123 library provides cryptographic(-inspired) generators with a C++11 interface and a liberal license.
- L'Ecuyer's generators such as MRG32k3a are free only for non-commercial use.
- Many of the generators recommended by @peteroupc have rather short periods for our purposes.
- The
xoshiroclass of generators recently proposed by Blackman and Vigna looks interesting, although the paper on the generator is currently only available as preprint and seems not to have been published in peer-reviewed form yet. - The PCG Generators have only periods up to
2^128, too short for our purposes.
Recommendations
Principles
- NEST 3 shall use the C++ standard library random number and distribution generators.
- NEST 3 shall in addition include Random123 generators and possibly others.
- Random number generators are wrapped in an abstract base class to support flexible exchange of the random number generator.
- Random distribution generators will be wrapped in a typedef, so that one can in principle replace a distribution generator at compile time.
- The user specifies only as single 32-bit
user_seed. - To seed RNGs for different streams, use
seed_seq(user_seed, f(stream_no)).
Comments
- Ad 1:
- From the generators available in C++
random, only the MT provides sufficient period and speed. It should be included in 32- and 64-bit variant. Theranluxgenerators are painfully slow. - Considering this comment from @peteroupc, we should consider wrapping
generate_canonical()in our base class to prevent 1 from being returned or even implement the more advanced solution proposed in ISO/IEC JTC1 SC22 WG21 P0952R0; see also P0952 R2 A new specification for std::generate_canonical cplusplus/papers#574.
- From the generators available in C++
- Ad 2:
- Random123 generators should be included to provide a completely different type of generators
- The Random123 license is compatible with inclusion, and code can be shipped with NEST.
- We should add
-marchflags as a CMake option to activate hardware support for AES-based generators. - Need to figure out how to include AES-based generators only where available (
-march). - The
xoshiroXXXwithXXX >= 256also seem interesting, but require wrapping.
- Ad 3:
- The base class needs to implement
constexpr result_type min()andmax(). Sinceresult_typecan differ for different generators (32 or 64 bit), we need to define these methods with return typeuint_fast64_tso they can hold all possible values, and protect against generators with even larger return type bystatic_assert(). Integer promotion rules make this approach safe in light of thegenerate_canonical()implementations in the three STL variants.
- The base class needs to implement
- Ad 4:
- Distribution generators need not be exchangeable at runtime, therefore, no base class is necessary.
- The fact that different STL implementations may generate different random number sequences is unfortunate, but the advantages for the STL (seeding, 53-bit randomness) weigh stronger. Since either GCC or Clang should be available on any system, validations between systems will be possible.
- Ad 5, 6:
- All seeds for individual streams are generated from the single 32-bit
user_seedprovided. - The
user_seedis never used "as is". stream_nois identical to thread number for per-thread streams, andnum_threads+1for the global stream.f(stream_no)is still to be determined. It needs should not return zero and must be guarantee to map10^7differentstream_novalues to different output values.- The C++
std::seed_seqmay have weaknesses. But using this architecture, we could later replace the standard-defined seeding rule by a different one.
- All seeds for individual streams are generated from the single 32-bit