Sunday, February 27, 2011

call/cc! I yield! Bah, lazy-seq.

Modifications

  1. 03-01-2011: an additional scheme variant has been added.
  2. 03-02-2011: minor modifications

Imperative Iterators

The archetypal implementation of imperative iterators are Java's subclasses of the Iterator interface. Everyone who has ever implemented an Iterator should acknowledge the pain that is to explicitly maintain state between calls. The essential problem is that the state has to be explicitly maintained. In particular there are two different concepts of state: the iterator "low level" state and the iteration state. In the iterator we have to save "what we have done" and explicitly do so.

Then every time next() or hasNext() are called, we have to do some more computation, save our internal state and return the control explicitly to our caller. This is the implementation of a Javish iterator in clojure to pre-order visit a tree like structure.

I decided to use Clojure instead of Java because the example structure was easier to write, the example could be just run in a REPL, i prefer Clojure to Java and other irrelevant details. However, that is essentially the way iterators are written in object oriented languages (included nice ones, like Python) or languages that have to interoperate with OO languages.

The principal issues is that the "value returning" logic has to be mingled with the iterator bookkeeping logic; I believe that this (although more versatile) is far more complicated that simply adding all the values in a (flat) structure and returning the structure. The reason why this is not the dominant approach is that it is less flexible and less efficient. We also notice that iterators can be infinite.

Ah, if you like OO mumbo-jumbo, this is of course the Iterator pattern.

Yield Statement

In this section we briefly examine Python yield statement. The corresponding yield expression (which essentially allows limited coroutines) is not considered here. If a Python function contains the yield keyword, its semantic changes.

The yield statement may only be used inside functions. A function that contains a yield statement is called a generator function.

[...]

When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way.

[...]

Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached.

If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call.

Essentially instead of using the structure of the previous example, we could have written:

The idea is that when yield is encountered the generator "returns" the value and when next() is subsequently invoked, it resumes execution from that spot. Essentially it is a neat syntactical facility that "saves" somewhere the continuation of the computation in order to be ready to resume from that point.

call/cc

Ok. I said it. Continuation. That is. When the goal is to experiment with control, scheme comes to mind. This is our representation of the tree:

Now I provide an easy example for doing the preorder "functionally". The idea is to provide a function taking a callback function that is called on each element of the tree when is visited. The function also returns the return values of the callback function (in reverse order); anyway, use display as a callback function and we have our familiar example:

Here we did not use continuations. In the following example, I use continuations to save the state of the computation where to resume (after the yield) and to jump out (simulating python yield statement) with another continuation. I think this code should be written more clearly; I'm no expert with continuations (I usually find them too general), so suggestions are welcome.

When generator is called, it returns a function that every time that is called returns the successive element. It behaves like an almost like an iterator and has the structure of Python generator functions. Boilerplate code could be avoided using a macro. Essentially, lines 2-6 are boilerplate and similarly are lines 11-13 and 18. The difference with a "java like" iterator is that when there is no way to test if there is a "next" element (no hasNext method, closures are objects with just one method! ;)) and it returns null forever. So the implicit protocol is "do not put null objects in the tree" and if you get null, then you are done. Other strategies could be used, e.g., instead of returning the value, a list containing the value is returned; empty list means that the iteration completed; list containing an empty list meant that the tree contained a null valued node.

We could have used continuation passing style and that would perhaps have been more similar (even though far more complex to understand and write) than the corresponding Clojure way to do it.

In the comments a different scheme continuation based solution has been suggested by Marco. It is far more elegant than my original solution, but is structurally different from the Python code of the sections above. In fact, it does not use an explicit stack to manage the tree traversal (as both the Python version and my scheme version do), but relies on more recursive calls, essentially using the implicit call stack.

What about lazy-seq?

Essentially we specified that iterators are needed because they are efficient and more general. Indeed in languages such as Python, generators are an easier way to write iterators. What about Clojure? The example at the very beginning is not the way things should be done in Clojure. On the other hand the iterator in scheme would be a good API for Clojure as well. In fact it is the very same pattern that we have with map (Haskell programmers here may want to extend the discussion on "mapping" on different algebraic types, such as Trees, and not only Lists).

Imperative iterators are just a design pattern. Call/cc is a feature so powerful you can use it build almost every other control abstraction. Yield can be used to implement coroutines (which could also be implemented with call/cc). And what about lazy-seq? Lazy-seq is nothing like that. Many Lisp books show a simple example where macros are used to implement lazy evaluation (or at least list lazy evaluation, which is what matters most). lazy-seq basically does that. The difference is that other Lisps are usually not built around the idea of lazy sequences the way clojure is.

Consequently, most libraries and APIs do not work (or do not play well) with lazy evaluation. They could be made to work, but it is simply outside Common Lisp pragmatics, and there is not anything wrong with that.

But back to lazy-seq... why it is here? Consider the code below.

I want to point out the structure of the tree-seq-aux function. Indeed, the function is not recursive. It simply return a lazy-seq. The second parameter of the cons cell we put in the lazy-seq, is another call to tree-seq-aux. However, this is not recursion: when lazy-seq is evaluated, we already returned from the previous call.

But thing are even more interesting than this. That cons cell is somewhat similar to the iterator we saw at the very beginning. Its car (first) is the current value. It is "next". And what is its cdr (rest)? Suppose that our iterator was not stateful. Suppose instead that calling next() returns both the current value and another iterator which will provide the successive values. Well... basically it is our cons cell.

Pair<Value, ConstIterator<Value>> v = it.next();

That is essentially our cons cell. But in fact we somehow have the same pattern we had in the continuation based example. We "save" what to do next (akin to the jump continuation) in the cdr of the cons cell; however, we do not have to evaluate it right now because of the lazy-eval. As I already said, if we used continuation passing style, we would have placed the recursive call which we put in the lazy-seq cons cdr in the explicit continuation.

The advantage is that it is easier to write, easier to understand, easier to use. Unfortunately is much more specific and powerful. But I don't mind practical tools, when they have the elegance lazy-seq has.

Technorati Tags: , ,, ,



9 comments:

Anonymous said...

Hi Enrico,

using call/cc you can traverse the tree without allocations.
Working example in R5RS:

(define node-value car)
(define node-left cadr)
(define node-right caddr)
(define children cdr)


(define *example* '(3 (4 () ()) (2 1 (0 () (-1 () ())))))


(define (generate tree)
(let ((jump #f))
(letrec ((generate-values
(lambda ()
(let loop ((tree tree))
(cond
((null? tree) '())
((pair? tree)
(loop (node-value tree))
(loop (node-left tree))
(loop (node-right tree)))
(else
(call/cc
(lambda (rest)
(set! generate-values
(lambda ()
(rest 'resume)))
(jump tree))))))
(jump '()))))
(lambda ()
(call/cc
(lambda (k)
(set! jump k)
(generate-values)))))))

--marco

PS: Is there a way to have good indentation in comments?

Unknown said...

WOW! Beautiful!
I had to follow the execution with the debugger to figure out how it did what it did.

If I understand correctly the idea is that the "last" (source-wise) call/cc is always the one to return the value to the user. In the "main loop" when a value is encountered, a we jump there to return the value. However, before that we set a "different" generate-values, so that at the successive execution we instead use (rest 'resume).

(rest 'resume) essentially jumps "back" to the else clause returning a bogus 'resume value. So that returns and then the (loop (node-left ...)) is executed.

I have to admit I could not have written that stuff. I have to make exercise with call/cc, I suppose. Haven't yet decided if I do not like it very much because I don't know how to use it or simply because I prefer higher level constructs.

Anyway, about the comments, I'm afraid there is no way. It's blogger's fault.

I use gist:
https://gist.github.com/849109.js?file=tree-generator.scm

This is your example. I also included that (with proper credits) in the main body, because it was to nice to leave it here buried in teh comments).

Anonymous said...

Thank you for including my code in your post.


I wouldn't say call/cc is a low-level construct. At least at my ears, "low level" sounds like "near to the metal" and it's not the case, so I would rather say that call/cc is an elementar construct.

This is nitpicking, I know.

While I agree that in a finished program, call/cc can be replaced by generators, exceptions and so on, it's invaluable for exploratory programming.

--marco

Unknown said...

> Thank you for including my code in your post.

Thank you for suggesting the code

> I wouldn't say call/cc is a low-level construct.
> At least at my ears, "low level" sounds like "near to the metal"
> and it's not the case, so I would rather say that call/cc
> is an elementar construct.

I understand what you mean. I would similarly say that, e.g., mu-minimization or Y are low level constructs, as functional programming languages essentially "build" over such concepts to provide higher level ones (e.g., if).

However, probably my point of view is a bit biased towards mathematical abstraction. That is what I meant with low-level. With call/cc (+macros) you can build the other control abstraction. However, without macros it would be rather cumbersome to use "only" call/cc.

> This is nitpicking, I know.

I believe that different points of view help to clarify exposition.

> While I agree that in a finished program,
> call/cc can be replaced by generators, exceptions and so on,
> it's invaluable for exploratory programming.

Agreed. Perhaps even more to explore paradigms of computations. For example, Common Lisp does not have continuations in the sense scheme has (I know of some libraries, but AFAIR they have some limitations), but I would not say that CL is not as great for exploratory programming.

Ben W said...

You know, while I probably wouldn't have been able to write either call/cc version, I think Marco's is actually easier to read (though it may have been harder to write).

Unknown said...

@Ben I agree. Marco's version is surely more idiomatic and looks like scheme. Mine is a tentative implementation of something like Python yield construct.

It should be considered what the "right" macro expanded. The idea is that in scheme we can implement through macros something that behaves like Python yield statement. And /that/ should be easy to understand and use (as is Python yield statement, which is widely used in the Python community even from OO proponents and people with little functional programming understanding).

I think that yield has elegance because it brings powerful functionality with extreme ease of use. My code has not, because it is not "easy". I definitely have to write the macro to show what I meant. ;)

Anonymous said...

Maybe I misunderstand you, but I don't see why call/cc would be useful only if used inside a macro.

A variation on previous example that let you write a 'preorder' function that looks very similar to your python example:

https://gist.github.com/852929

Since it does not use an explicit stack, and since here 'yield' is not a statement, I would say that it's higher level than the python one.

--marco

Unknown said...

> Maybe I misunderstand you, but I
> don't see why call/cc would be
> useful only if used inside a
> macro.

I did not say that it would be useful *only* inside of a macro. However, I think that call/cc and macros are a powerful combination. But that does not imply that call/cc is useful only (or prevalently) inside a macro.

I believe that the combination is extremely powerful because macros enhance the language with new constructs (Hoyte basically states that most macros in fact create control structures, he is extreme as always, of course, but in a sense he has a point).

call/cc is a very flexible tool to build control abstractions. Consequently I believe the two play very well together.

Unknown said...

> A variation on previous example
> that let you write a 'preorder'
> function that looks very similar
> to your python example:

It does, indeed. Probably you have almost written what I had in mind (that is, a defgenerator where yield has the usual python meaning, which would be a very easy to write, given the tools you wrote).

> Since it does not use an explicit
> stack, and since here 'yield' is
> not a statement, I would say that
> it's higher level than the python
> one.

I think this is our old discussion on yield, which I suppose you do not like because it is less general than call/cc (and in fact plainly different, even more than less general).

I like yield because most programmers learn to use it (at least, most python programmers). That is: these programmers would not use call/cc, would not use more functional languages. Still, they can use something which is better than C++, more functional than Java and rather mind enlightening, if not as mind enlightening as scheme.

For example, I would not associate "high-level" vs. "low-level" as a matter of being a statement. Python has statements, which you may even consider a design error, by the way. Still, Python has statements. Being a statement or not is not a matter of being high or low level. It's a matter of how the language is done. As I said before, it may be a language design error, but that's another level of discussion.

Besides, even though I used a yield statement and was interested in what is hidden under yield, yield in Python is an expression (since I think some years).

A yield statement is just a statement containing a yield expression. And you can "jump back" to a yield expression, of course. With a value, I mean.

The rest of the discussion on being low-level or high-level is hard for me. Correct me if I'm wrong, but you associate the idea of high-level to being "better", perhaps more general and functional.

On the other hand I associate the idea of being higher-level with the idea of being more similar to the way people think (which is probably a bad definition, as different people would disagree on what is high level depending from the way they think). My idea of high-level is strongly related with being easy to use and maintain and short to write, without being too short.




I try to be more precise... yield basically means (at least in its basic form): save the state, return this and be ready to start from here.

Whatever is "after" the yield statement is in the future, from the point of view of the generator.

This closely resembles me the idea of continuation passing style (where you put the "future" in the continuation) with the idea of "save the future somewhere to grab it".

My more pressing concern was drawing a parallel with lazy-seq. Essentially lazy-seq is not as general as yield (but it is as easy to use). In fact, you "return" lazy-seq which contains the "saved state" (thanks to closures + being extremely stateless).

In lazy-seq you do not have the idea of "come back here". However, if "here" is just the beginning of a function with some parameters you can compute, than you can do it with lazy seq.

So, my initial idea was to show how easily generator based code can be translated in clojure using lazy-seq.

Then the idea of continuation passing + save the continuation made me think about call/cc. Of course the problem is that I'm far more comfortable with cps than with call/cc.

I tried to express with call/cc what I had in mind. And it was not a very good exposition of my thoughts (which is not "yield vs. call/cc", but rather "implement yield in terms of call/cc" -- and see if the code clarifies what I meant).