November 14, 2007

Random numbers (where it all started)

At first, Haskell seemed a reasonable language... until Monads appeared! Until then, purity was a clear plus of Haskell but... what about a random number generator (RNG)?

A pure function is deterministic: it depends only on its inputs and, as computers are deterministic as well, its output must be the same given the same input data.

Conversely, an RNG function (at least as it's known in the imperative world) must be non-deterministic and return a different outcome whenever it's called (without arguments).

Haskell can't give up on purity, so the solution is simple: instead of returning a value, an RNG function returns a computation.

A computation is like a machine with some pipes going in. By itself, the machine doesn't do anything. But, if you run it, providing fuel through the pipes, it will eventually produce something.

The machine is pure, while the fuel is impurity (or side-effects). The fuel comes from the environment under which the machine is run.

So, an RNG function is actually a machine which, when run under the special enviroment of the IO Monad, provides a number. More, you can build a special RNG machine (i.e. write an RNG computation) producing an infinite list of random numbers: just fill in sufficient fuel until more random numbers are needed!

In this way, Haskell let's you stay on the pure side when building the computation, letting you do all sort of things a computer theorist would do to a program (applying transformations, proving properties...), and getting you dirty when running your computation.

Sometimes it's hard to catch up with the meaning of Monads at first because the first Monad you encounter is the IO Monad, which is implicitly run by the Haskell run-time environment. So, you lose the concept that a Monad computation must be run to produce something useful.

For instance, the State Monad has runState function. But, as I said in one of my previous post, IO has its own run function: unsafePerformIO, from Foreign module.

So, let's see how to build an RNG computation:

randomList r = do g <- getStdGen
return \$ randomRs r g

randomList takes a pair representing a range of numbers and return a computation that, when run, provides an infinite list of numbers. Indeed, randomList's type is:

Num a => (a,a) -> IO [a]

Specifically, when you state:

g <- getStdGen

you are saying something like: "g is a pipe. When you run this computation, please provide a StdGen value here". randomRs is a plain pure function taking a StdGen (which is basically a seed with a next function iterated to provide an infinite list of random number from it) and the range to which random numbers must be belong to.

How do we run this machine? Simple: put unsafePerformIO in front of it.

randomList r = unsafePerformIO \$
do g <- getStdGen
return \$ randomRs r g

Now randomList produces an infinite list of random numbers because it'll run the computation inside it.

But, beware! randomList is now and impure function masked as a pure function. Haskell could tricked by this masking and make the wrong thing misled by the purity assumption.

So, I promised myself not to write another Monad tutorial, but I did, at last!

Geoff Wilson said...

Have a look towards the end of this article on how to use QuickCheck to generate a random sequence without using performUnsafeIO

Cristiano Paris said...

Thank you Geoff!

Anonymous said...

Most of the functions in System.Random are actually pure functions, it's just the global generator (getStdGen & company) that's bound to the IO monad.
Random functions in most procedural languages use a seed object as well: the difference is that in procedural languages the seed only has to be dealt with implicitly, but in Haskell you have to actually pass it round if you want to use it without unsafeIO.
Most Haskell monad tutorials focus on the bind operator, but I think the join function shows what a monad actually is in a more intuitive way:
join :: M (M a) -> M a