Programming and IT Nuggets: Terseness in Erlang

Paul Graham's release of Arc got me thinking about terseness as an end in itself. To me, terseness as a desirable quality is a measure of the signal to noise ratio of source code. How much of the source code represents the programmer's actual intent in addressing a problem; and how much is structure, scaffolding or other artifacts imposed by the language that do not express that intent?

I'll probably have more to say about this general subject. Now I'm going to change the subject to Damien Katz's "famous" What Sucks About Erlang blog post. Or at least, to his particular point about how much single assignment sucks because instead of doing this:

   f(X) ->
     X = foo(X),
     X = bar(X),
     baz(X).

He has to do this:

   f(X) ->
     X1 = foo(X),
     X2 = bar(X1),
     baz(X2).

Which makes it "hard" to add a function in the middle, because you have to renumber all those Xn variables to new variable names.

Well, as I've said before, the problem with toy examples is that it's easy to miss the point. Katz was so close to finding a real criticism here, but missed it: because X1, X2, X3... in series like this is icky programming, it's true. It's also the sort of thing people do when they're new to Erlang and they're trying to write C or Ruby code in it.

Learning Erlang consists of two epiphanies and an increasing zen. The first epiphany (if it's your first FP language) is about functional programming. At some point, you "get" functional programming and things seem cool, exciting and straightforward that used to seem abstruse and weird.

The second epiphany is OTP. It's hard to explain why the combination of modules and design principles in OTP is so good without going down the path of writing a big application yourself. But once you've gone down those ratholes and started implementing your own ad-hoc server state management, request protocols and restart schemes, then you "get" OTP and you have that "aha!" moment about why supervisors work the way they do and why there are trees of them and what transient children are for and that kind of thing.

The increasing zen of Erlang, though, is the kind of insight that gradually makes things like serially numbered variables all but disappear from your code; that makes you intuitively put cases at just the right spot so you avoid having some branches with recursive calls and some without, or makes you realize why sometimes the best way to return a certain value is to call yourself again.

There is one pernicious pattern, though, that does require ugly serial binding, and it's the one that goes like this:

   f(X) ->
     {value, {X, X1}} = lists:keysearch(X, 1, ?valuetab),
     {atomic, X2} = bar(X1),
     {ok, X3} = baz(X2),
     X3.

This is not just some dumb coding, and it can't be as easily fixed as Katz's toy example. Katz can just do this:

   f(X) ->
     baz(
         bar(
             foo(X))).

But the pattern-matched examples can't be so easily fixed.

I don't think this is a "bad" characteristic of Erlang. What it reflects is that two powerful features--function composition, and pattern matching--are in tension. I don't think there's a way to "fix" that. But I think there might be a way to make it look better, at least.

One way of doing this is to write wrappers over the functions. How many times have you written some thin wrapper over one of the key-lookup style functions? I've done it a lot (and I might have a bit more to say about that in a future post, as well).

Since Erlang has this nifty way of doing pattern-matching, functions will tend to return structures like that all the time--structures that are not the most convenient form for passing to the next function.

What we want is a way to specify the pattern inline, have it extract a value, and use that as the value of the whole pattern match. Well, maybe we don't want that. But I think I do:

   f(X) ->
     {ok, ?_} = baz(
        {atomic, ?_} = bar(
           {value, {X, ?_}} = foo(X))).

I'm using ?_ in my example because it turns out that's one way of making it work, and my other ideas do not, without modifying the Erlang scanner. So what we want is to magically transform expressions like this:

   {ok, [?_|_]} = blah(X)

into this:

   begin {ok, [Something|_]} = blah(1), Something end

Luckily, Erlang has such magic, a parse_transform.

Here's a wrinkle. You've probably noticed that "Something" is not a very good variable name for us to be inserting wherever we want to use this feature. We could to this "gensym()" in a number of ways--if we do it at parse_transform time, we could insert it where we find our marker. We can even scan the parse tree to look for conflicts.

Unfortunately, we still need a marker of some kind--something that's not going to conflict with variables in the source. If we use a special atom or tuple, we run into the possibility of such a legitimate conflict. And honestly, if our "magic" item is anything more than one character (or, okay, two characters I guess), it defeats the point of the exercise. Nicely, however, we can use our ?_ macro to expand to whatever we want. Now, unfortunately, _ is the only punctuation character that can be used as a macro, and I could see all kinds of client code conflicting with it. But as long as what it represents can't conflict, that client code could redefine the marker to something else (?r for return?).

Luckily, we can name a variable something that we can (just about) guarantee will never conflict with client code--at least, to the extent that random nonces and GUIDs "never" conflict, which is frankly good enough. We can just use a variable with 128 bits of random goodness and define our _ macro to that. Then the parse transform can look for a match operation in the code tree that has that magic variable on the left-hand side. And when it does, it encloses it in the block and adds the variable expression to the end to make the block evaluate it.

This also makes our code terser. So yes, it brings us back to that subject. I went looking for some other examples of repetition, and found a few. One consists of the clauses in function definitions:

   mydo({foo, Foo}) -> Foo;
      mydo({bar, Bar}) -> Bar;
      mydo({baz, Baz}) -> Baz.

Couldn't this be written:

   mydo({foo, Foo}) -> Foo;
      ({bar, Bar}) -> Bar;
      ({baz, Baz}) -> Baz.

I would like to be able to do without them: funs do. But since the result does not produce an Erlang form, I can't effect that with a parse transform.

Another common source of repetition is in self-recursion. This is especially obvious when the leaves of a case expression involve self-recursive calls:

   find_by_traversing_lookups(K, Keylist, Levels) ->
    case {value, {K, ?_}} = lists:keysearch(K, 1, Keylist) of
       1 -> Levels + 1;
       Next -> find_by_traversing_lookups(Next, Keylist, Levels + 1)
    end.

If you do that a lot (and you do), that's a lot of unnecessary text to scan. Since the only thing we're interested in in a self-recursive call is the argument list, you can get rid of the visual clutter by replacing self-recursive calls with _:

   find_by_traversing_lookups(K, Keylist, Levels) ->
    case {value, {K, ?_}} = lists:keysearch(K, 1, Keylist) of
       1 -> Levels + 1;
       Next -> _(Next, Keylist, Levels + 1)
    end.

This actually can't conflict with other function calls because _ is never bound to a variable and therefore can't be used to dispatch a call. Of course, another parse transform could use this--it's handy "syntax" because the form scans fine, unlike other characters (using <- came to mind but again, I would have had to modify the scanner).

The code for these experiments is at google code. They're not much now, but I am already finding the pattern-matched returns handy.

Programming and IT Nuggets

Tuesday, March 25, 2008

Terseness in Erlang

No comments:

Blog Archive

About Me