I’ve been playing with parsers a lot recently in no small part because of requirements at my job. Anyhow since I’m already interested in declarative programming languages, it wasn’t much of a jump to get myself completely immersed in the parsing literature.

After trying to use a prolog DCG (Declarative Clause Grammar) for one of the projects at work I became really annoyed with some of the qualities of the implementation.

First off, I see no reason why the thing has to backtrack over completely flat predicates that are *OBVIOUSLY* deterministic. It is so easy to discover that this can be a case statement rather than a disjunction with choice points that I see no reason that the compiler can’t take care of this for me. I will plug my ears if anyone says that detecting modes is undecidable since I don’t care (I care much less about decidabilty then most comp-sci people I guess, I mean who wants to write in a decidable fragment anyhow?). We need compilers to take a more agoric approach to compilation. If something is undecidable we should just cut out a decidable chunk. The advantage to the agoric method of reducing the space is that we don’t have to come up with some arbitrary chunk that will look silly for future compiler methods, we can just implement some strategies and give up if they take too long. I’m planning on implementing this for Twine in the very near future, like maybe this evening.

The other thing that I’ve been looking at is the use of ropes instead of difference lists for DCGs. I think these have tremendous opportunities. One advantage is that we can exploit much more structure sharing than is possible with lists (which can only share tails). Another advantage is that when we have neighboring characters that share a single array storage they have an opportunity to be coalleced resulting in even greater reductions in space use. Ropes also concatenate very very rapidly which I think will be really useful for speeding up unification.

I’m hoping that with a bit of mode analysis, a bit of more sophisticated interpretation/compilation strategy (left-corner parsing perhaps?) and a bit of tabling we can come up with a DCG language that doesn’t suck. Certainly it shouldn’t suck as much as the current one I’m using if it is to be judged a success.