Friday, August 20, 2010

Internal vs External Iterators

In order to solve the problem of visiting each element of a collections, two solutions spring to mind: internal and external iterators. The main different is "who controls the iteration". In internal iterators the iteration is controlled by the iterator itself: the caller provides a callback function which is called on each element.
In this sense, the map function is essentially an internal iterator: iterates on each element of a collection and calls the function parameter on each element. Map is a primitive function in Python, scheme/lisp, Haskell and many other languages. All these languages have higher order functions and functions as first class citizens. In fact, this strategy could (and is) be employed with C/C++ as well.

On the other hand in External Iterators the caller controls the iteration. This is the case with Python for statement, for example.

In fact, object zealots usually regard the external iterators as more flexible, as they make it possible to compare two different structures or other similar stuff:

import itertools as it

for i, j in it.izip(left_collection, right_collection):
    if i != j:
        return False
else:
    return True

In fact, this could be rather easily done with higher order functions, as long as you have the right set. What is more difficult is to deal with control; you probably need call/cc and similar primitives for many things which are rather straightforward with external iterators.

They are the more widespread method for iteration in Java (Iterator class) and Python. On the other hand in Ruby and most functional languages it is very much more common to use internal iterators. Though nothing prevents from using internal iterators in Python, it is much more common to use external iterators.

In C++, using Boost, internal iterators become somewhat more common. This comes from a project of mine:
std::transform(items.begin(), items.end(), sizes.begin(),
                   bind(&relation::size,
                        bind(&iset::value_type::second, ::_1)));
    std::for_each(sizes.begin(), sizes.end(),
                  var(counter) += boost::lambda::_1);

Let alone the items.begin(), which [b]is[/b] an external iterator, although used in an internal iterator manner.

No comments: