Tuesday, December 19, 2006

A Prolog XML printer in less than 100 lines.

Suppose we have an XML parser that converts an XML file into a Prolog term.
We may (quite) easily build one with DCG. I will be doing one soon, for values of 'soon' that depend on the status of my thesis.
We describe the way the prolog term is built:
Text is simply a prolog term. In a less naive implementation we may want to use the string datatype. However, to show the power of a declarative approach, things would not change a lot. Moreover, if we are using swi-prolog (a widespread open source implementation) internally itis able to use UCS (that is a superset of 16 bit Unicode). We only have to make the parser smart enough to convert entities in the right Unicode character and the whole thing will behave correctly
Attributes are represented as key:value terms, where : is a binary infix functor. Again, this is quite a natural representation.
A tag is represented like an tag(Attributes, Children) term, where term is the name of the tag, andl Attributes and Children are (possibly empty) lists of attributes and 'children'. This representation is not very clever. In fact you can have more optimized programs representing tags like element(tag, Attributes, Children). That is the way things are done in SWI-Prolog official SGML/XML parser library, but this is a simple example.
A 'child' is either a tag or a text section.
This (informal) description can be translated in a straightforward manner in a prolog program. The last thing before presenting the source is an example term representing a very simple XHTML file.

html([], [head([], ['']), body([], [p([], ['ciao ciao', img([src:'ciao.jpg'], [])]) ])])

And now the (81 lines long) prolog source code. You can call it with pp_tag(Term), where Term is the term representing the XML file.

pp_tag(Tag) :-
        !,
        pp_tag(Tag, 0).
pp_tag(Tag, N) :-
        Tag =.. [Tag_Name, Attributes, []],
        !,
        s(N),
        write('<'),
        write(Tag_Name),
        pp_attributes(Attributes),
        write(' />').
pp_tag(Tag, N) :-
        Tag =.. [Tag_Name, Attributes, Children],
        !,
        s(N),
        write('<'),
        write(Tag_Name),
        pp_attributes(Attributes),
        write('>'),
        N1 is N+1,
        pp_children(Children, N1),
        nl,
        tab(N),
        write('
        write(Tag_Name),
        write('>').
pp_text(Text, N) :-
        name(Text, Chars),
        pp_lines(Chars, N).
pp_line(Line, N) :-
        name(Atom_Line, Line),
        s(N),
        write(Atom_Line).
pp_lines(Lines, N) :-
        ((append(Line, [10|Rest], Lines), !),
            pp_line(Line, N),
            pp_lines(Rest, N)
        ;
            Line = Lines,
            pp_line(Line, N)
        ).
pp_children([], _N) :-
        !.
pp_children([X|Xs], N) :-
        !,
        pp_child(X, N),
        pp_children(Xs, N).
pp_child(Child, N) :-
        pp_tag(Child, N)
        ;
        pp_text(Child, N).
pp_attribute(Name:Value) :-
        !,
        write(Name),
        write('='),
        pp_quoted(Value).
pp_attributes([]) :-
        !.
pp_attributes([X|Xs]) :-
        write(' '),
        pp_attribute(X),
        pp_attributes(Xs).
pp_quoted(Term) :-
        !,
        write('"'),
        write(Term),
        write('"').
s(0) :-
        !.
s(N) :-
        nl,
        tab(N).

Of course this is a 'toy' implementation. You find the real thing here.

No comments: