October 22, 2007

Is Haskell really expressive?

My idea of an expressive program is one whose meaning (or at least the meaning of its functions when taken apart) is kind of intuitive.

Haskell is often remarkably pointed out as an expressive language. Nevertheless, I'd say that a language is simply a weapon in the hand of the developer and that expressiveness mostly depend solely on him.

Indeed, consider the following Haskell piece of code:

cartesian = (foldl (liftM2 $ flip (:)) [[]]).(map (\n -> [0..n-1]))

Apart from its name, would you tell that this function evaluates a multiple cartesian product? Actually you need to have an intermediate knowledge of Haskell and its libraries to catch the meaning of this function:
  • You should know that foldl is like a generic "summation"
  • You should know that liftM2 will lift a function taking two arguments into a Monad (the List Monad actually).
  • You should know that semantics of the List Monad has to do with non-determinism, i.e. passing two lists to a lifted function will result in applying that function to each and every pair built from them.

Writing programs this way is really concise but I sometimes feel like it's not really intuitive. My understanding of Haskell programs improves everyday and lots of Haskell snippets I've read in the past are now less obscure. But I think my most importat breakthrough in understanding the language will happen when I'm able to make the above snippet's meaning apparent even for the almost occasional (and non-Haskell) programmer.


Cale Gibbard said...

There's a better way to write that. The foldl is awkward (in fact, you should try to avoid foldl, as foldr is almost always what you want, and if you really do want foldl, you're more likely to want foldl', the strict variant of it.)

I'd have written that function as:

cartesian = sequence . map (\n -> [0..n-1]) . reverse


cartesian = mapM (\n -> [0..n-1]) . reverse

The reversal is arguably a bit unnatural though. I only included it because you'd strangely used foldl and flip in your original solution.

This is in the list monad. The function given to mapM sends each element of the input list to a list of choices for what should be put in that position, and gives a list of all possible ways of making those choices. Hopefully once you know that, the function is very readable, as well as being more concise.

As a side note, if you'd used foldr, you could have written it as (with only slightly different behaviour):

cartesian = foldr (liftM2 (:)) [[]] . map (\n -> [0..n-1])

or, to be a little more monadic:

cartesian = foldr (liftM2 (:)) (return []) . map (\n -> [0..n-1])

This is not *too* unnatural. The foldr (liftM2 (:)) (return []) is effectively replacing list structure with program structure, turning a list of computations into a computation which runs each in turn, returning a list of results from each. Each cons in the list of computations is replaced with the function producing the computation which runs its first parameter, then its second, and returns the cons of the two. The nil at the end is replaced with the computation which always returns nil. In the list monad, to "run" a computation is simply to pick an element from the list in all possible ways.

Here's another way to write the foldr:

sequence [] = return []
sequence (x:xs) = do {v <- x; vs <- sequence xs; return (v:vs)}.

However, this is the very definition of the library function called sequence, in Control.Monad. It takes a list of computations and turns it into a computation returning a list of results. In a sense, it's the primordial loop, from which most other sorts of loop can be built. In the list monad, it is a proper Cartesian product.

Cristiano Paris said...

Thank you Cale and sorry for the delay in my reply: I had to think about it since using sequence was surprising to me, at first (but natural afterwards).

Concerning foldl/foldr, I think you're talking about memory leaks.

In that case, I haven't touched the matter and, as soon as I do, I think your point will be clearer to me. Thank you anyway, I'll take the advice as a rule of thumb. I guess using foldr would have made the flip disappear.

Turning to sequence, I knew of it but I didn't see the connection to cartesian.

In fact, seeing how sequence is defined, it's not too far from the definition of cartesian.

I guess your comment goes in the same direction as my post. I agree using sequence is a "better" way to write it.

But does "better" mean "more expressive"?

1 - You have still to know that sequence simply lifts the cons operator (:) used in the original list (thus turning a list to a sequence).

2 - That a lifted cons combines the elements of its arguments, i.e. you have to know how the Monad List works, which is somewhat far from the common understanding of lists and how they are used in the non-Haskell world (lists are lists, not the basic component of indeterminism).

More, 1) simply pushes away the expressiveness issue, but doesn't solve it.

In fact, you must be able to infere, from the sole name of the sequence function, its meaning, which, in the List Monad case, is not that obvious.

Thank you for your post: I've learnt a lot today (so point 1 of the reasons why I opened this blog has been met!).

Cale Gibbard said...

Well, the foldr/foldl issue isn't just something about stack overflows.

Just for reference, here are the two relevant definitions:

foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs

Now, foldl has the moderately desirable property of being tail recursive. This means that when it recurses, it need not push a new location to return to on the stack. This accounts for most of its popularity in strict functional languages.

One thing which is important to remember is that Haskell is lazily evaluated, which means that expressions are evaluated in an outermost-first fashion (and whenever a function parameter occurs multiple times in the body of a function, the results of evaluating it are shared).

With this in mind, foldr suddenly has a very desirable property. In a certain sense, it finishes immediately! In the case of a nonempty list, it applies f to x and (foldr f xs) and then f is evaluated. If f doesn't end up needing its second parameter, the recursive call to (foldr f xs) might never be used. So foldr interacts nicely with laziness.

Another nice property of foldr is that it can be thought of as directly replacing the list constructors with other values, (:) with f and [] with z, throughout the list. This makes it intuitive to work with. Have a look at the pictures on http://cale.yi.org/index.php/Fold_Diagrams for more along these lines.

On the other hand, foldl, being tail recursive, cannot possibly stop recursing until it finds the end of the list, which means it fails miserably on infinite lists.

Moreover, the outermost-first evaluation means that the (f z x) parameter to foldl won't be evaluated before the recursive call to foldl! This means that all foldl does is build up a gigantic expression (well, about the size of the input list), which can't evaluate until it has finished recursing. Sometimes that expression gets large enough to cause a stack overflow when it finally hits.

To alleviate this problem, there is a stricter version of foldl in the Data.List library called foldl'

foldl' f z [] = z
foldl' f z (x:xs) = let y = f z x in y `seq` foldl' f y xs

This is still tail recursive, but it forces y = f z x to be evaluated before proceeding with the recursive call.

Of course, there are still times where you'll only partially evaluate the value returned by foldl, and if your combining function is expensive, it may still be a good idea to use the plain lazy foldl, however, most of the time, if you really want foldl, you probably want the strict version.

Cale Gibbard said...

Regarding sequence, you don't *quite* have to know that it lifts (:). What you do have to know is that it turns a list of computations into a computation which runs each in turn and then returns a list of the results. Then you just have to know what "running" a list means in the list monad: it's just picking one of the elements. From those two, the fact that sequence is a Cartesian product falls right out.

Of course, it does involve knowing that much, but you can't expect to know a language without learning some definitions somewhere. ;)

In general, you do have to think a bit about what the functions in Control.Monad are doing whenever you approach a new monad, but they usually turn out to be something useful.

For example in the ((->) e) monad of functions from a fixed type e (whose instance can be imported from Control.Monad.Reader),

sequence [f, g, h] x = [f x, g x, h x]

In that monad, running a computation is just applying the function to the whole computation's parameter.

For another example from the list monad,

combinations xs = filterM (const [True, False]) xs

This obtains an answer about whether to keep each element of the list by running the list [True, False]. That is, it alternately decides to keep and to throw away each element. Hence, the result is the list of all 2^n possible sublists of the original.

Monadic code can be a little strange at first, but once you're used to the mechanisms involved, it can be highly expressive.

Cristiano Paris said...

Thank you Cale. Your post was really useful to me.