January 31, 2008

Python 2.5's iterators in Haskell (sort of)

DISCLAIMER! The code I present here is likely to be far from perfect and in fact I ask for comments from the community, specifically on how to get rid of the "RecPair" user data type.

Is it possible to mimic Python 2.5's iterators in Haskell? If you don't
know what iterators are (or you're used to 2.4's ones), have a look to PEP343.
By and large, iterators are Python's way to surrogate continuations, which
represent a more general concept usually found in FP languages.

Basically, I want to be able to write something like:


bar = do y <- yield 1
yield (y+2)


and have bar return immediately the argument in a yield call to the caller. Yet, the caller should look like:


foo k = do x <- send undefined -- this will be ignored
y <- send k
return $ [x,y]


First two considerations:


  • It natural to use the Cont monad when implementing the yield function.

  • As two successive calls to send may have different outcomes, some sort of state must be involved. For the basic implementation we resort to State monad.



When trying to solve the puzzle, I stumbled upon a beautiful and straightforward implementation of Python 2.4's iterators by David Menendez. The problem of extending his approach to 2.5 iterators turned out to be a bit more difficult. First, I'll show you the code:


module Main where

import Control.Monad.Cont
import Control.Monad.State

data RecPair a b = Nil RP (b,a -> RecPair a b)

yield x = Cont $ \k -> RP (x,k)

send x = do (RP (y,f)) <- get
put $ f x
return y

runYield it cl = evalState cl $ runCont it $ \_ -> Nil


Let's discuss the yield funtion's implementation. First, as we're working
in the Cont monad, we must return a Cont value which is basically a
function taking a continuation (what to do next) and build a pair (x,k).

x is the value passed to the yield function and then returned to the
caller.

What about k? In my mind, k is a function taking an argument (the
receiving variable of a '<-' statement in a do-block) and returning another pair (x',k'), which is the result of the next small step of the Cont computation, till the next yield call. In this way, a computation can't go on after a yield statement until you feed it with a value through a send call, which is the semantics of iterators. Naively, at start I wrote: yield x = Cont $ \k -> (x,k)

But ghc promptly issued the following error:


Occurs check: cannot construct the infinite type t = a -> (t1,t)


What's going on? Let's try and do some type analysis:


  1. The type of "\k -> (x,k)" is "t -> (b,t)" which is perfectly right, as long as "x" is in the scope of that definition and is of type "b".

  2. But, the argument of "Cont" must have type "(a -> r) -> r".

  3. Let's do the bindings: "t = a -> r" and "r = (b,t)".

  4. Substituting the latter in the former we obtain "t = a -> (b,t)".



This equation is clearly impossible to solve (no types exist to make the
equation true) and the Haskell's type system did his job.

How to escape the problem? The real point here is that the above type
equation can be expanded infinitely. There must be a way to stop the
expansion. What if "r" would be masked by an opaque type?

Enter RecPair. We turn the above equation into:


t = a -> RecPair a b


which is perfectly solvable provided there are instances of the type
"RecPair a b". In the above snippet you can find a convenient definition of
RecPair which is suitable to our goal. There are other, like:


data RecPair a b = Nil RP (a -> (b,RecPair a b))


which turns out to be less elegant when you modify yield and send
accordingly.

The other part of the code is just plain Haskell code. Notice that solution
I proposed doesn't mimic completely Python 2.5's iterators: for example,
send can't deal with more than one iterator at the time due to the use of the State monad.

But these are only technicalities which can be easily overcome, for instance,
using the IO Monad and a couple of IORef here and there.

0 comments: