Contents

Introduction

This project is a Python C module implementation of a quick and dirty random number generator. The Python module is called quickndirty and provides the following interface:

  • seed(s) - sets the seed s to be used in random number generation process
  • random() - gives random U(0,1) numbers
  • randomseq(n) - returns a n-tuple with random U(0,1) numbers
  • randint(start, end) - returns random 32 bits integers uniformly distributed between start and end.
  • getrandbits(k) - gives k length random bit strings
  • getstate - gives the current state of the random number generator
  • setstate(state) - sets the state of the random number generator
  • jumpahead(n) - changes the current state of the random number generator (it iterates the generator n steps forward)

The idea is to generate U(0,1) numbers, random bit strings and random integers uniformly distributed, through a simple interface. I believe that the standard distributions and stochastic process are meant to be applications that use a random number generator and we can't create dependencies between random number generator and any other particular algorithm or implementation.

The module also has a class called Rng. The Rng's constructor accepts an optional parameter named seed and has the same interface of the quickndirty module, in fact, the module wraps a static instance of Rng class.

Why quickndirty?

The main motivation is: create a (pseudo) random number generator that is extremely quick and works fine for most problems. This generator has a period of 2^32 and there is no mod operation because it uses the 32-bits overflow. So, only one 32-bit integer multiplication is done to generate random numbers. I know that this implementation is not portable to 64-bit platforms, but it's good enough for 32-bits platforms.

Examples

Here we have some statistical tests (quantitative and qualitative), in order to show that quickndirty is well performed and reliable.

K-S test

K-S test is a well known goodness of fit test that compares the U(0,1) cumulative distribution with the empirical one generated with a pseudo random number generator, particularly, quickndirty. For our purposes, F_i is the theoretical cumulative distribution of a random variable U(0,1)

F_i = \int_{0}^{x_i}dx = x_i

where 0 \lt x_i \lt 1. The U_i is the simulated cumulative distribution of a uniformly distributed random variable

U_i = \frac{1}{N}\sum_{j=1}^{N}I_{X_j \leq x_i}

The subscript i represents any discrete bin.

We take the measure

KS = \max_i(|F_i - U_i|)

that we call K-S measure and as smaller (close to zero) it is, means that the random sample, that we have generated with quickndirty, has its origins in a uniform distribution.

import quickndirty as qnd
from scipy import zeros, arange, array

n = 100  # sample's size

rng = qnd.Rng()
random = rng.random
u = zeros(n+2)
u[n+1] = 1.0
for i in xrange(n):
    u[i+1] = random()

u.sort()

X = []
F = []
for x in arange(0, 1, 0.002):
    X.append(x)
    for i in xrange(len(u)):
        if x >= u[i] and x < u[i+1]:
            F.append(i/float(n))
            break
            
X = array(X)
F = array(F)
ks = max(abs(F-X))

print 'K-S Test =', ks

Chi-square test

Chi-square test is another goodness of fit test.

n = 100  # sample's size
k = 20   # number of bins

import random
import quickndirty as qnd
from scipy.stats import chisquare
from scipy import ones, zeros

rng = qnd.Rng()

p_obs = zeros(k)
for j in xrange(n):
    u = rng.random()
    for i in xrange(k):
        if u >= i/float(k) and u < (i+1)/float(k):
            p_obs[i] += 1/float(n)
            break

p_exp = ones(k)*k/float(n)
chisq, p_value = chisquare(p_obs, p_exp)
print "sample's chi-square value =", chisq
print "sample's p-value          =", p_value

Variance law

The variance law is not a strict law, in practice, it's a consequence of the Weak Law of Large Numbers that states that:

Var(X) = \frac{\sigma^2}{n}

where Var(X) is the empirical variance, \sigma^2 is the theoretical variance and n is the sample size. As the theoretical variance is constant we get the following scaling law:

Var(X) \propto n^{-1}

We have sampled n random integers between 0 and 127 to fulfill 128 bins. Each bin have an average value

X_i = \frac{C_i}{n}

where Ci is the number of items inside each bin. So Var(X) is the variance of the Xi sample.

The images below show the histograms where we can observe that the variance decreases as the sample size increases and the the last image shows that the variance scales with the sample size (n) following a power law with coefficient \alpha \approx -1.

#!/usr/bin/env python

import quickndirty as qnd
from pylab import *

rng = qnd.Rng()
x = []
y = []
for m in (100, 1000, 10000):
    k = 7
    n = (2**k) * m
    count = zeros(2**k)

    for i in xrange(n):
        w = rng.getrandbits(k) # generate integers from 0 to 2**k
        count[w] += 1
    count /= m
    
    sigma = var(count)
    x.append(n)
    y.append(sigma)

    clf()
    plot(count)
    xlim(0,2**k-1)
    ylim(0.7,1.3)
    title(r"n = %d, $\sigma^2$ = %f" % (n, sigma))
    savefig("histogram_%d" % n, dpi=50)

clf()
(b1, b0) = polyfit(log(x), log(y), 1)
z = exp(polyval([b1, b0], log(x)))

loglog(x, y, 'bo')
loglog(x, z, 'r--')

ylabel(r'$\sigma^2$')
xlabel('n')

title(r"$\alpha$ = %f" % b1)
savefig("variance", dpi=50)

Spatial correlations

#!/usr/bin/env python

import quickndirty as qnd
from pylab import *

rng = qnd.Rng()
random = rng.getrandbits

# parameters
p = 8
N = 100
T = 100

# computed parameters
q = p*p
y = zeros(q)
m = zeros((p,p))
n = q*N
l = int(log2(p))
f = 1.0/(n/q)

# computing loops
for t in xrange(T):
    for k in xrange(n):
        i = random(l)
        j = random(l)
        m[i][j] += f/T

# computing averages
k = 0
for i in xrange(p):
    for j in xrange(p):
        y[k] = m[i][j]
        k += 1

# creating graphs
axhline(y=1, lw=2, color='r')
x = arange(1,q+1)
plot(x,y,'b-o',lw=2)
xlim(1, q)
ylim(0.8, 1.2)

show()

Computing pi

#!/usr/bin/env python

import quickndirty as qnd
from pylab import *

rng = qnd.Rng()
random = rng.random

def computepi(n):
    a = 0.0
    for t in xrange(n):
        x = random()
        y = random()
        r = sqrt(1-x*x)
        if y < r:
            a += 1.0
    return a*4/n

y = []
for t in xrange(1000, 101000, 1000):
    y.append(computepi(t))

# creating graphs
axhline(y=pi, lw=2, color='r')
plot(y,'b-o',lw=2)
show()

Central Limit Theorem

#!/usr/bin/env python

import quickndirty as qnd
from pylab import *

rng = qnd.Rng()
random = rng.random

n = 500
bins = 20
N = n*bins

for k in (1,2,4,8):
    q = sqrt(1.0/3.0) * sqrt(k)
    h = zeros(N)
    for t in xrange(N):
        h[t] = sum( -1 + random()*2 for i in xrange(k) ) / q
    clf()
    hist(h, bins=bins, normed=1)
    xlim(-3,3)
    ylim(0,0.4)
    savefig('hist_%d' % k, dpi=50)

Project

This project is hosted at Google:

quickndirty

Download

quickndirty-1.02.tar.gz
quickndirty-1.01.tar.gz
quickndirty-1.00.tar.gz
Last modified February 26, 2010 9:11 pm / Skin by Kevin Hughes
MediaWiki