I’ve been playing with parsers a lot recently in no small part because of requirements at my job. Anyhow since I’m already interested in declarative programming languages, it wasn’t much of a jump to get myself completely immersed in the parsing literature.
After trying to use a prolog DCG (Declarative Clause Grammar) for one of the projects at work I became really annoyed with some of the qualities of the implementation.
First off, I see no reason why the thing has to backtrack over completely flat predicates that are *OBVIOUSLY* deterministic. It is so easy to discover that this can be a case statement rather than a disjunction with choice points that I see no reason that the compiler can’t take care of this for me. I will plug my ears if anyone says that detecting modes is undecidable since I don’t care (I care much less about decidabilty then most comp-sci people I guess, I mean who wants to write in a decidable fragment anyhow?). We need compilers to take a more agoric approach to compilation. If something is undecidable we should just cut out a decidable chunk. The advantage to the agoric method of reducing the space is that we don’t have to come up with some arbitrary chunk that will look silly for future compiler methods, we can just implement some strategies and give up if they take too long. I’m planning on implementing this for Twine in the very near future, like maybe this evening.
The other thing that I’ve been looking at is the use of ropes instead of difference lists for DCGs. I think these have tremendous opportunities. One advantage is that we can exploit much more structure sharing than is possible with lists (which can only share tails). Another advantage is that when we have neighboring characters that share a single array storage they have an opportunity to be coalleced resulting in even greater reductions in space use. Ropes also concatenate very very rapidly which I think will be really useful for speeding up unification.
I’m hoping that with a bit of mode analysis, a bit of more sophisticated interpretation/compilation strategy (left-corner parsing perhaps?) and a bit of tabling we can come up with a DCG language that doesn’t suck. Certainly it shouldn’t suck as much as the current one I’m using if it is to be judged a success.
November 14th, 2005 at 5:34 pm
(the following is basically illiterate rambling, prompted by the fact that I happen to think a lot about parsing lately)
have anyone given any thought into the relationship of various parsing-specific optimizations out there and general unification optimization?
for instance, this “rope” thing you mention sounds vaguely similar to the “tree-structured stack” abstract data-structure used by Tomita’s parsing algorithm (of course, when I read the actual linked article, I’ll descover that this is not at all the case, but anyway).
btw, is Twine available in any form?
November 17th, 2005 at 10:26 am
I had never heard of a tree-structured stack before so I can’t comment on its relationship to unification parallelisation.
Prolog like languages can utilize something called “paged binding arrays”. I’ve also developed a form of unification that uses optimistic concurrency. I have no idea how it performs as I haven’t had time for benchmarks, but I imagine it isn’t great.
As regards general methods, I’m not aware of them. I think in general you need to look closely at the graph structure of the relationships. I’ll post more on this when I get access (hopefully tomorrow?).
I haven’t released Twine yet, but I’ve already changed the name to Tmesis. At least for now. That is unless people object. It works fairly well but it is very prototypic. It doesn’t perform the sorts of determinism analysis that I would like it to. Perhapse I’ll get to it this weekend.