Monday, June 7, 2010

A matter of homoiconicity

Somewhere around the late sixties the term “homoiconicity” has been introduced in the computer science community. In fact, the term was older and dates to the 1960, at least according to Wikipedia. Strangely enough, I did not find traces of the term homoiconic or homoiconicity in McIllroy [1]. I was not unable to recover the full text of the paper on TRAC [2]. However, widespread use of the term is due to McKay (along for the very idea of object oriented programming and a good deal of criticism on Lisp – perhaps I will look for some opinions of famous lispers on Smalltalk to make things even) in his Ph.D. dissertation.
The basic idea is simple: program is data. That is to say, the program is represented in the very same format as data. Thus, a program can manipulate programs (and even itself) in the same ways it uses for data. This is great and is the foundation for great and funny meta-programming.
Indeed, references on TRAC should make clear that a structured representation of programs is not needed. TRAC manipulated strings and programs where represented as strings. On the other hand, Lisp manipulates lists and a program is a list. It is quite rare in a discussion on programming languages with a lisper that he does not use the term “homoiconic” somewhere in the discussion, usually before becoming condescending (a proof of this is being developed actively – no trolls have been harmed during the experimentations).
What I find extremely interesting is that this great property was simply obvious in theoretical models. Universal Turing Machines and primitive functions need an enumeration of programs (functions, respectively) in order to manipulate such objects. If such enumeration was not possible, most computer science would not have been created.

The von Neumann architecture also relies heavily on instructions being represented as numbers and stored along with data. It’s a bit of a pity that this properties have not been naturally ported to programming languages (except from Lisp and some others).
Indeed, we all agree that homoiconicity is a great property in programming languages. In fact, I often find frustrating complete lack of it (e.g., Java is completely not homoiconic, since its main data structure is the Object but Java programs are not even near to being objects – ok, you can always call the compiler get an AST… but this is not what the whole thing is about). Other languages have limited support for it.
In Python functions and classes (and objects) are available when compiling a module. The whole decorator concept, for example, relies on the fact that functions are objects and can be passed as such at compile time. Very powerful meta-programming techniques in Python rely on “program” manipulation, where programs are represented as classes and functions (and not syntactically, like in Lisp). This makes some kind of manipulation very hard (without resorting to byte-code manipulation – and then the same objection with the java compiler/AST thing applies – or to basic string manipulation, which I would not consider as a symptom of homoiconicity, since strings are not “the main data” in Python). However, I would not rule Python out of the bunch of homoiconic languages, either. Perhaps “partially homoiconic” applies? Or perhaps do we need better built in tools to manipulate classes and functions in Python to qualify for the full homoiconic(tm) certification from old lisp hackers[3]?

References

  1. McIlroy, M. D. 1960. Macro instruction extensions of compiler languages. Commun. ACM 3, 4 (Apr. 1960), 214-220. DOI
  2. MOOERS, C.N. TRAC, a text handling language. Proc. ACM 20th Nat. Conf. Cleveland, Aug. 1965, pp. 229-246.
  3. Not meant to be offensive… perhaps old could be replaced with ancient?

No comments: