Skip to content

Modernize random number generation for NEST 3 #1440

@heplesser

Description

@heplesser

This issue is a follow-up of #245, #349, #508, #1296, #1381, #1405.

This exposition is based on discussions with NEST developers, especially @jougs and @hakonsbm.

Background

The current NEST random number architecture dates back to 2002 and is thoroughly outdated today. The arrival of carefully standardized RNGs with the C++11 standard, the essentially complete transition to 64-bit CPUs, the availability of dedicated cryptographic RNG hardware (AES) and the transition from 100s to 100s of 1000s of parallel processes create an entirely different landscape for random number generation today.

Key weaknesses include

  1. a difficult user interface for seeding RNGs, requiring users to supply multiple seeds
  2. a similar difficulty in selecting other than the default RNG
  3. use of the Knuth lagged Fibonacci generator as default generator
  4. difficulty to add modern generator libraries such as Random123.

General considerations

NEST

Use of random numbers

Random numbers are consumed in NEST

  1. during node creation to randomize node parameters;
  2. during connection creation
    1. to select connections to be created,
    2. to parameterize connections;
  3. during simulation
    1. to generate stochastic input (randomized spike trains, noisy currents),
    2. to support stochastic spike generation.
      In all cases except certain sub-cases of 2.i, randomization is tied to a node, which in turn is assigned to a specific virtual process, either because the node is concerned (cases 1, 3) or because the node is the target of a connection (case 2). NEST design guarantees that nodes assigned to different VPs can be updated independent of each other, and the same holds for connections if created or parameterized from a target-node perspective. Therefore, for all these purposes, one random number stream per VP is required and sufficient to ensure that different VPs can operate independently.

The only exception are certain sub-cases of 2.i in which all VPs need to draw identical random number streams and discard all numbers that are irrelevant. This pertains to fixed_outdegree and fixed_total_number (initial step) connection patterns. For these purposes, a global random number stream is required.

Quantity of random numbers required

NEST therefore requires in total N_VP+1 random number streams. With a view to future exascale computers, we need to prepare for N_VP=O(10^7) random streams. We can estimate the number of random numbers required per stream as follows:

  • Based on current practice, we assume N = 1000 neurons per stream; this results in a total of O(10^11) neurons, on the scale of the human brain and thus a sensible upper limit.
  • Each neuron receives excitatory and inhibitory Poisson background input, i.e., 2 Poisson distributed random numbers per time step per neuron, each of which typically requires multiple uniform random numbers; we assume for simplicity 2 uniform per Poisson number, i.e., 4 plain random numbers per time step per neuron.
  • A time step is 0.1 ms and a typical simulation duration 100 s simulated time for 10^6 time steps.
  • In total, simulation will consume ~1000 x 4 x 10^6 = 4 x 10^10 random numbers per stream (consumption during network construction will usually be negligible).
  • To allow for a margin of error, we assume O(10^11) random numbers consumed per stream during a single simulation.

Replicability guarantee

A NEST simulation compiled with the same compiler and library, on the same computer hardware and initialised with a fixed random seed, shall generate exactly the same output on each execution.

Exchangeable random number generator

The user shall in an easy way, in particular without recompilation, be able to perform the same simulation with different random number generators. This allows the user to test whether effects observed in simulations may be artefacts of a specific RNG type.

Parallel random number streams

Collision risk

L'Ecuyer et al (2017) provide a recent account of random number generation in highly parallel settings. The main approach to providing independent random number streams in parallel simulations is to use a high-quality random number generator with long period and seed it with a different seed for each stream. They then argue that if we have s streams of length l and a generator with period r, the probability that two streams seeded with different seeds will overlap is given by

p_overlap ~ s^2 l / r .

We have

s = N_VP = 10^7 ~ 2^23
l = 10^11 ~ 2^37

so assuming an rng period of r = 2^128 we arrive at

p_overlap ~ (2^23)^2 x 2^37 / 2^128 = 2^-45 ~ 3 x 10^-14

while for a period of r = 2^256 we obtain

p_overlap ~ 2^-173 ~ 10^-52 .

The probability of stream collisions is thus negligibly small, provided we use random number generators with a period of at least 2^128.

L'Ecuyer (2012, Ch 3.6) points out that certain random number generators classes will show too much regularity in their output if more than r^1/3 numbers are used (this relation is based on exercises in Knuth, TAOCP vol 2, see L'Ecuyer and Simard (2001)). While it is not certain that that analysis also applies to (short) subsequences of the period of an RNG, one might still re-consider the analysis above, but using l^3 ~ 2^111 instead of l when computing p_overlap. We then obtain

Period r | p_overlap

  • | -
    2^128 ~ 10^38| >> 1
    2^256 ~ 10^77 | 2^-99 ~ 10^-30
    2^512 ~ 10^154 | 2^-355 ~10^-107

Thus, for generators with periods of at least 2^256, we have negligible collision probability also under this tightened criterium.

Available seeds

Traditional seed() functions typically accept a 32-bit integer as input and initialize RNG state from this. This allows only for 4 x 10^9 different seeds, so if a simulation requires 10^7 streams, we can at most perform 400 different simulations. This is not acceptable and generators with reliable, better seed generators are required.

Available values

Many existing RNGs return 32-bit integers, which are converted into [0, 1)-uniform random numbers by simple division. This allows at most 2^32=4 x 10^9 different values even for 53-bit-mantissa doubles (see also boostorg/random#61). Given that our streams consume 10^11 numbers, we would expect every possible value to occur many times over during a simulation, which is not ideal. But keep in mind that while we will see the same value many times, it will occur in a different sequence of numbers every time provide the underlying generator has sufficiently long period.

Random number and deviate generator libraries

The following random number and deviate generator libraries have been considered primarily:

GSL and Boost

The GSL random number facilities work strictly with 32 bit random numbers and seeds. This strongly limits there utility in view of the requirements listed above.

The Boost Library Random module uses only 32 bits in generating uniform-[0, 1)-distributed numbers, which form the basis of all random deviate generation. Many deviate generators also appear to rely heavily on constant or pre-computed tables (e.g. normal distribution), which seems little cache efficient and may even cause slow memory access if static data is local to a single CPU. Therefore, the Boost Random module also seems of limited value.

C++ Standard Library

The random number and distribution components of the C++ Standard Library are defined in great detail in §29.6 of the C++ Standard; all references to the standard in the following are to the (see C++17 Working Draft N4659, which is openly available).

Currently, three implementations of the C++ standard library are available

  • GNU's libstdc++ (random, bits/random.h, bits/random.tcc
  • LLVM's libc++ (random)
  • Microsoft's STL (random)
    Where the standard leaves aspects to the implementations, checking these three implementations provides comprehensive answers for all practical purposes.

Advantages

  • Available out of the box with all C++11 compiler installations
  • Clean C++ interface
  • Random number generators are precisely defined, including specific requirements on numbers generated for given seeds.
  • Well-defined, standardised seed-sequence concept for generating complete state data for generators with large state parameterised by a sequence of 32-bit integers; based on seeding scheme of MT19937 (§29.6.7.1)
  • Well-defined, standardised way to convert raw random numbers into uniform-[0, 1)-distributed random numbers in which all mantissa bits are randomised, i.e., 53 bits for double(generate_canonical(), §29.6.7.2). This is independent of whether the underlying generator is a 32 or 64 bit generator.
  • Even though it does not appear to be required explicitly by the standard, all *_distribution classes in all three implementations inspected consume random numbers only via generate_canonical(), i.e., using 53 bits of randomness.
  • All generators support serialisation and de-serialisation via operator<<() and operator>>().

Disadvantages

  • The details of numbers drawn from random distributions is left to the implementation (§29.6.8.1-3) as long as the numbers are distributed correctly. Different implementations (libstdc++, libc++) indeed generate different sequences of random numbers for some distributions.
  • Fewer RNG types available than in GSL; not particularly problematic, since most of the GSL generators are of more historic interest and do not meet our period requirements.

Other generators

A number of other random number generator libraries are available. They usually provide a small number of raw random number generators and would require adaptation to be usable by random distribution generators. Many of these libraries do not seem to follow modern open source code development best practices (e.g. code in open repositories with systematic bug tracking, code review and automated testing).

Recommendations

Principles

  1. NEST 3 shall use the C++ standard library random number and distribution generators.
  2. NEST 3 shall in addition include Random123 generators and possibly others.
  3. Random number generators are wrapped in an abstract base class to support flexible exchange of the random number generator.
  4. Random distribution generators will be wrapped in a typedef, so that one can in principle replace a distribution generator at compile time.
  5. The user specifies only as single 32-bit user_seed.
  6. To seed RNGs for different streams, use seed_seq(user_seed, f(stream_no)).

Comments

  • Ad 1:
  • Ad 2:
    • Random123 generators should be included to provide a completely different type of generators
    • The Random123 license is compatible with inclusion, and code can be shipped with NEST.
    • We should add -march flags as a CMake option to activate hardware support for AES-based generators.
    • Need to figure out how to include AES-based generators only where available (-march).
    • The xoshiroXXX with XXX >= 256 also seem interesting, but require wrapping.
  • Ad 3:
    • The base class needs to implement constexpr result_type min() and max(). Since result_type can differ for different generators (32 or 64 bit), we need to define these methods with return type uint_fast64_t so they can hold all possible values, and protect against generators with even larger return type by static_assert(). Integer promotion rules make this approach safe in light of the generate_canonical() implementations in the three STL variants.
  • Ad 4:
    • Distribution generators need not be exchangeable at runtime, therefore, no base class is necessary.
    • The fact that different STL implementations may generate different random number sequences is unfortunate, but the advantages for the STL (seeding, 53-bit randomness) weigh stronger. Since either GCC or Clang should be available on any system, validations between systems will be possible.
  • Ad 5, 6:
    • All seeds for individual streams are generated from the single 32-bit user_seed provided.
    • The user_seed is never used "as is".
    • stream_no is identical to thread number for per-thread streams, and num_threads+1 for the global stream.
    • f(stream_no) is still to be determined. It needs should not return zero and must be guarantee to map 10^7 different stream_no values to different output values.
    • The C++ std::seed_seq may have weaknesses. But using this architecture, we could later replace the standard-defined seeding rule by a different one.

Metadata

Metadata

Assignees

Labels

I: Behavior changesIntroduces changes that produce different results for some usersI: User InterfaceUsers may need to change their code due to changes in function callsS: HighShould be handled nextT: EnhancementNew functionality, model or documentation

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions