Computer Science& Logic& Maths03 Feb 2007 12:55 pm

I just got “ML for the working programmer” in the mail a few days ago,
and worked through it at a breakneck pace since receiving it. It
turns out that a lot of the stuff from the “Total Functional
Programming” book is also in this one. Paulson goes through the use
of structural recursion and extends the results by showing techniques
for proving a large class of programs to be terminating. Work with
co-algebras and bi-simulation didn’t quite make it in, except for a
brief mention about co-variant types leading to the possibility of a
type: D := D → D which is the type of programs in the untyped lambda
calculus, and hence liable to lead one into trouble.

I have to say that having finished the book, this is the single most
interesting programming book that I’ve read since “Paradigms of
Artificial Intelligence Programming” and “Structure and Interpretation
of Computer Programs”. In fact, I would rate this one above the other
two, though I think it might be because of my current interests.

The conjunction of “Total Functional Programming” and “ML for the
Working Programmer” have completely changed my world view of automated
deduction. Interestingly, my advisers work looks even more appealing
from this perspective.

The basic crux of the issue is this. Given that we restrict ourselves
to total functional programming, as described in the aforementioned
article, we will have the ability to prove theorems directly by
manipulation of the program source.

My adviser told me about this the first time that I met him. It at
first seemed like a curiosity. It has however, blossomed into a
completely different way of thinking about how proofs should be
performed.

I’ve spent the last few days converting proofs into program source,
which I then manipulate equationally. I have a few examples of
inductive proofs that can be written quite naturally as structurally
recursive functions.

datatype nat = Z | S of nat

fun lt n m =
    case m of
	Z => false
      | (S m') =>
	case n of
	    Z => true
	  | (S n') => lt n' m'

fun divide n d =
    case n of
	Z => (Z,Z)
      | (S n') =>
	let val (q',r') = divide n' d in
	    case lt (S r') d of
		true => (q',S r')
	      | false => (S q',Z)
	end

Both of these functions are direct “extractions” of a program from a
proofs that I wrote. The second one is a proof with two existential
variables. The two existential variables present themselves as return
values. In the case of “lt” it simply computes a truth value.

The datatype nat = Z | S of nat defines natural numbers with the
correspondence Z=0, S(Z)=1, S(S(Z))=2 etc…

The proof is a proposition that goes like this:

∀n,d:nat. d>0 ∃q,r:nat. n=dq+r ∧ 0≤r<d

You might notice I dropped the restriction on d in the actual source
code. That can be encoded with an auxiliary function in any case.

The proof proceeds inductively, basically you can turn the divide
function inside out to recover the inductive proof.

The base case is n=0 which gives 0=dq+r which gives r=0 and q=0 (since
d is greater than 0). The first case is exemplified in the case
structure above. (Z,Z) = (q,r) the two existential variables are
returned with a valid result for n=0.

The inductive case is this.

n: ∃q,r. n=dq+r ∧ 0≤r<d
(n+1): ∃q’,r’. n+1=dq’+r’ ∧ 0≤r’<d
if(r+1)<d
then n+1=dq+r+1

We see that this gives us that r’=r+1, q’=q giving us a solution to r’ and q’ from r and q (and hence solving the n case, gives us some information about n+1)

else n+1 = d(q+1)+r’

which we can rewrite as: n+1 = d(q+1)+r’ = dq+d+r’=dq+r+1 so d+r’=r+1,
but r+1 must be at least d otherwise we wouldn’t have failed the test
r+1<d, and r’ can be no less than zero, so we must choose r’=0
this results in (finally!)

n+1 = d(q+1) with r’=0,q’=q+1

We have a proof! For the extracted program you can look at the source
and see that in fact it mirrors the proof directly, short of a few
simplifications done by hand. Making use of the r,q in the proof of
r’ and q’ is the same as my use of r and q in my proof of the n+1 case.

I’m looking forward to a deeper understanding of the correspondences
between proofs and programs with my new found capacity to produce actual functional program proofs by extraction manually. I’m also
curious to try my hand at generating termination proofs directly, and
also at making a syntax that more directly supports total functional
programming.

Computer Science& Logic& Maths& Uncategorised26 Jan 2007 03:29 pm

Recently on Lambda the Ultimate there was a post called “Total Functional Programming”. The title didn’t catch my eye particularly, but I tend to read the abstract of any paper that is posted there that doesn’t sound terribly boring. I’ve found that this is a fairly good strategy since I tend to get very few false positives this way and I’m too busy for false negatives.

The paper touches directly and indirectly on subjects I’ve posted about here before. The idea is basically to eschew the current dogma that programming languages should be Turing-complete, and run with the alternative to the end of supplying
“Total Functional Programming”.

At first glance this might seem to be a paper aimed at “hair-shirt” wearing functional programming weenies. My reading was somewhat different.

Most hobbiest mathematicians have some passing familiarity with “Turing’s Halting Problem” and the related “Goedel’s Incompleteness Theorem”. What this paper is trying to do is say “So what!”. The basic idea is that restriction to a syntacticly restricted language can restrict the semantics in such a way that these problems do not apply. The resulting (*huge*) class of programs can then solve almost everything that we think is important.

The “Bazooka” of the paper is a line describing how every program for which we have a complexity upper bound, is in fact inside the restricted language. This realisation is profound. It means that all of the algorithms that anyone has bothered to find upper bounds for (a huge class of programs mind you!) is accessible to this technique.

The paper even deals with infinite data (co-data) in the framework and mechanisms to properly account for streams and other (possibly) infinite structures.

The idea of working in syntactically restricted languages that are sub-Turing (for any number of reasons) is not new. In fact I think I’ve mentioned DataLog (wikipedia entry) previously. In the case of DataLog however, there are serious deficiencies in the expressive power.

Definitely the best paper I’ve read this year. I’m not sure how accessible it is to lay-people, but it should be accessible to anyone with a CS B.S. or people who have some familiarity with functional programming.

Computer Science& Logic& Maths13 Jan 2007 01:00 am

I just spent a couple of very cold weeks in Anchorage Alaska. I went out with a bunch of friends to a bar on the night prior to leaving, where I met up with an old friend (and listened to two other friends play music). I started describing the lambda calculus to him since it is related to my work and he was curious what I was up to. In any case I found myself unable to produce addition of the church numerals off the top of my head, which I found pretty disturbing since it shows that I probably didn’t quite understand it in the first place. Therefor I sat down this morning to see if I couldn’t reconstruct it from first principles.

It turns out that it is relatively easy. First, let us start with a bit of notation, and a description of the lambda calculus. Since wikipedia describes the lambda calculus in detail, I’ll just show how one procedes with calculations. As examples let us start with the church numerals.

1 = (λf x.f x)
2 = (λf x.f f x)
3 = (λf x.f f f x)

n = (λf x.fn x)

where fn means “f f … f” n times.

The idea is that each numeral is represented by a lambda term. The lambda term describes the arguments (f and x in this case) and the “body” in which we substitute the arguments when the are passed in. An example of an “application” of a lambda term to an argument would be:

1 g = (λf x . f x) g => (λx . g x)

We pass in g, and replace every occurance of f in the body with g, and delete f from the list of variables. we can continue to apply this term to another term

(λx . g x) y => g y

Which leads us to conclude that

1 g y => g y

Ok, so now that you’ve seen a few calculations, let us try to construct addition. The first thing I tried to do is to construct the function +1. Clearly, we want +1 n => (n+1). n+1 is (λf x. f(n+1) x) which is also (λf x. f fn x). Since n = (λf x. fn x) we need to somehow add another f. The trick is to get the arguments to use the same symbols, and to remove the lambda abstraction. We can do this by applying the church numeral to the symbols we want to use, in this case f and x.

(λf x. fn x) f x => fn x

So now we know how to get rid of the lambda abstraction. Now we can add an f on the front, which will get us closer to n+1.

f (λf x. fn x) f x => f fn x

We are almost there. Now we need to have the λf x part at the beggining so that we look like a church numeral.

(λf x. f (λf x. fn x) f x) => (λf x. f fn x) = (λf x. f(n+1) x)

Looking great so far! Now we just need to be able to take the whole (λf x. fn x) form as an argument. This turns out to be really easy, we just put a variable in it’s place and add it to the front of the list of lambda arguments.

(λa f x. f a f x) (λf x . fn x) => (λf x. f (λf x. fn x) f x) => (λf x. f fn x)

This shows that (λa f x. f a f x) is the +1 function!

+1 = (λa f x. f a f x)

Ok, so now that we have +1 can we get a function that takes an n and returns +n? This would get us a long way towards addition. We can call this function n->+n.

Ok, so we can guess what +n will look like easily. It’s going to look just like +1 except with n leading f’s.

+n = (λa f x. fn a f x)

So we should try to figure out how to take the fn out of a church numeral, and place it there. Well, we should apply the same trick of applying the church numeral to f and x so that we can extract the body.

(λf x . fn x) f x => fn x

ok, but really we want a to follow, based on +n, so instead of using x, let us apply the form to a.

(λf x . fn x) f a => fn a

Great! Look how close we are. now we just need to abstract the a,f,x and place f x following it.
(λa f x. (λf x . fn x) f a f x) => (λa f x. fn a f x)

Ok, so now that we know how to take an n, and able to take the entire church numeral n as an a beta reduce it to n+1, let us abstract the n as an argument

(λb a f x. b f a f x)

Let us verify quickly that this function works.

(λb a f x. b f a f x) n =>
(λa f x. n f a f x) =>
(λa f x. (λf x. fn x) f a f x) =>
(λa f x. fn a f x)

Now, we can verify that this works on m, to obtain n+m as the +n function should:

(λa f x. fn a f x) m =>
(λf x. fn (λf x. fm x) f x) =>
(λf x. fn fm x) = (λf x. f(n+m) x) = n+m

Hooray! Not only did we find n->+n, but we have obtained the function “addition” for free. Since, once we have the +n function we can apply it to m. So we now have the function:

add = (λb a f x. b f a f x)

I haven’t tried multiplication and exponentiation yet, but you are welcome to try!

Computer Science& Logic& Maths26 Nov 2006 06:04 am

Probably one of the best papers I’ve read on the relationship between CBN, CBV and the Curry-Howard correspondance is the paper Call-by-value is dual to call-by-name by Wadler. The calculus he develops for describing the relationship shows an obvious schematic duality that is very visually appealing.

After reading the paper that I mentioned earlier on Socially Responsive, Environmentally Friendly Logic (which shall henceforth be called SREF Logic), it struck me that it would be interesting to see what a CPS (Continuation-passing Style) like construction looks like in SREF logic, so I went back to the Wadler paper to see if I could figure out how to mimic the technique for multi-player logic. It looks like the formulation by Wadler comes out directly by thinking about logic as a two player game! I’m excited to see what happens with n-player logic.

This has been a big diversion from what I’m actually suppose to be working on but I didn’t want to forget about it :).

Computer Science& Logic& Maths25 Nov 2006 06:02 am

Once again the internet comes to the rescue with a Systematic Search for Lambda Expressions. This is the answer to yesterdays question of whether we can iterate over isomorphic proofs exhaustively in order to extract all programs of a specification which in this case is realised as a “type”. Hooray for computer science!

Computer Science& Logic24 Nov 2006 03:23 pm

I have a bunch of ideas that I don’t have time to follow up on but I’d like to retain in some fashion so here goes.

1.

A long time ago I had a question about the shortest assembly program (parameterised by the instruction set of course!) that could encode the generators of the quaternions under multiplication. It is a very easy thing to program this, but as I was doing it, I found myself being highly doubtful that I was choosing the “right” encoding. In fact I almost certainly wasn’t. Would there not be some very tight encoding in terms of “normal” assembly language operators? This has been a persistent question in my mind that comes up frequently in different and often more general forms. If you want to make a fast compiler, how do you know what instructions to output? If you have a highlevel specification, how can you avoid the ridiculously high overhead usually associated with extremely high level languages?

Today I found this fabulous paper (from 1987!) that deals with some of the basic issues involved in finding the shortest program for a given problem. It’s called “Superoptimizer — A look at the smallest program”. I’d link to it, but I got it off ACM so if you are one of the few people that has a subscription you’ll know where to find it and otherwise I don’t want to give the money grubbers the link-love.

Some other thoughts that are in this region of my thought-space. Can you take a given specification with a proof and systematically transform it to enumerate the space of satisfying programs? If not, why not? Even if you can’t, might it not be interesting to just perform large searches of this space? Are there methods of using the transformation operators such that programs are only decreasing or minimally increasing? If so then perhaps we can call the search good at some size threshold since very large programs are unlikely to be good because of locality issues. Also Automatic Design of Algorithms Through Evolution is in a very similar vein.

2.

Concurrency is a nasty problem. It doesn’t have a nice formalism that everyone in the CS world can agree on. There must be like 500 different formalisms. All of them better (easier, not neccessarily faster) than the ones that we actually use (locking, threads, condition variables) but none of them stand out as “the right thing”.

I recently found a paper called:
Socially Responsive and Environmentally Friendly Logic
. I love the title :) But aside from that, the formalism is very nice. It is something that I’ve contemplated a bit but never had the drive to actually try to work out formally. The basic idea comes from the knowledge that Classical Logic and Intuisionistic Logic can be viewed as 2 player games. This game is pretty simple. If I have a proof phi then to win I have to prove it. If I have a proof ¬φ, then to win my oponent has to fail to prove it. If I have φ ∧ ψ then my partern gets to pick a formula and I have to prove it. If I have ∀x then my oponent gets to pick any stand-in for x that he would like. You can probably guess the rest (or look it up). This alternate logic breaks the essential two person nature of the logic. One interesting practical feature of negation in the traditional logics, is that they give rise to Continuations in the Curry-Howard Correspondance. So what does this give rise to in the N-player games defined by Abramsky? I’m not sure, but I suspect it might give process migration! Something worth looking in to.

Computer Science& Logic19 Nov 2006 05:19 am

So after a bit of research it turns out that the big reason that we don’t have a meta-compiler compiler generator project because sophisticated partial evaluators are pretty much never self applicable. As far as I can tell from the literature the Futamura projections in the context of logic programming have only ever been applied in practice on “off-line” partial evaluators. Off-line partial evaluators tend to be much less sophisticated since they do most of their reasoning “off-line”, that is, without attempting to unfold.

Apparently part of the reason for this is the use of extra-logical features in the sophisticated partial evaluators, in order to make them fast enough that they can reasonably be applied. It is hard to make performant prolog without using cut. Once cut is used however, all bets are off, since it is nearly impossible to reason about effectively.

I started writing my meta-compiler without using cut, but restricting myself to soft-cut, because of the purely declarative reading that can be given to soft-cut. If I’m careful perhaps this will allow me to make my meta-compiler self-applicable. Since I don’t really understand all of the details of why on-line partial evaluators are not generally self applicable, we’ll have to see.

Computer Science& Logic12 Nov 2006 05:51 am

Legend has it that a million years ago Plotkin was talking to his professor Popplestone, who said that unification (finding the most general unifier or the MGU) might have an interesting dual, and that Plotkin should find it. It turns out that the dual *is* interesting and it is known as the Least General Generalisation (LGG). Plotkin apparently described both the LGG for terms, and for clauses. I say apparently because I can’t find his paper on-line. The LGG for clauses is more complicated so we’ll get back to it after we look at the LGG of terms.

We can see how the MGU is related to the LGG by looking at a couple of examples and the above image. We use the prolog convention that function symbols start with lower case, and variables start with uppercase. The image above is organised as a DAG (Directed Acyclic Graph). DAGs are a very important structure in mathematics since DAGs are lattices. Essentially what we have done is drawn an (incomplete) Hasse diagram for the lattice defined.

Each of the arrows in our diagram represents a possible substitution. For instance ‘p(X)’ can have X substituted for either ’s’ or ‘g(r,Y)’. Anything that can be reached from a series of downward traversals is “unifiable” and the substitution of the unification is the aggregate composition of substitutions we encountered on the path. So for instance ‘p(X)’ will unify with ‘p(g(r,s))’ under the substitution X=g(r,Y) and Y=s. Any two terms which are not connected by a path are not unifiable.

The least general generalisation is also apparent in our picture. Any two terms will have a common parent in the Hasse diagram. The least, (in the sense of distance from the top) parent of any two terms in our diagram is the LGG. So actually the connection between the two is fairly straightforward (using 20/20 hindsight).

Computer Science& Logic11 Nov 2006 05:50 pm

Code transformation or meta-compilation as it is sometimes called (which is the general notion of techniques including Partial Evaluation, Supercompilation, Deforestation, or my advisors Distillation), is a powerful technique in computer programming. The benefits (and drawbacks) are almost certainly not sufficiently studied.

I was just conversing with my room-mate Tom about meta-compilation and I made the supposition that Meta-compilers are somewhat like the technology of the lathe. There are a huge number of technologies that require a lathe in order to be produced effeciently. A lathe can be viewed as a major nexis in the dependency graph of machining technology. A lathe is an almost a completely fixed precondition for the mill. The Mill is the crux of modern machining. It allows you to construct almost any currently available machined part. Without the mill we really wouldn’t have the industrial age at all. Do such things exist in computer programming?

Metacompiler technology is incredibly powerful. It is a technique that usually is concidered to be a superset of a partial-evaluator. It is a compiler technique that starts in the source language and ends in the source language rather than some target language as does a standard compiler. While this might at first sound trivial or irrelevant a few examples can convince one that it actually a very useful tool. (2*2) can be coded in most languages, but really it is just the literal 4. Partial-evaluation will reduce this computation at compile-time elminating the cost from the final executable. The power doesn’t stop there though. One particularly convincing example that I found was the partial evaluation of fairly simple grammar recogniser (parser) which reduced a problem directly from an NDFA to a DFA. Which is basically the compilation process used for regexps.

The Futamura projections give us some idea of just how powerful the technique is. If we have a metacompiler, we can metacompile an intepreter with respect to a program writen in the source language of the interpreter to arive at an executable in the language of the metacompiler. In fact, if we metacompile the metacompiler with respect to the interpreter and we can generate a compiler!

So I have a *lot* of questions about metacompilation. It sounds almost too good to be true (but there are good reasons to believe that it isn’t). Some of them are very technical which I will probably save for tomorrow’s post. The following question though is more philosophical, and practical (can those two happen at the same time?)

Why aren’t supercompilers/partial evaluators used as general compilation systems? If you can write a supercompiler in some high level, nice language like OCaml and then all you have to do is write an interpreter for your language of choice in order to produce a compiler, then why isn’t this done?

This seems like the holy grail of leveraging, or code re-use. You could write one really good compiler for a good language for specifying languages (Which ML was originally designed for, and of which OCaml is a decendant). One really good metacompiler. At this point every other language (front end, in the terminology of GCC) is simply the act of writing an interpreter. Writing an interpreter is *radically* simpler than making a sophisticated compiler. It is basically equivelent to a specification for the language. The process of language design can hardly be facilitated more than this since interpreters are pretty much the minimal requirement for specifing the operational semantics of a language!

My question is why isn’t this general procedure really carried out in practice? Are metacompilers not good enough in practice to produce high quality performant programs? Has it just not been tried? If not, I’d like to see some effort expended on this, since it seems like a crutial technology that could really be leveraged far more than any of the “shared VM” projects like C# with minimal cost to language implementors.

Gaeilge& Languages& Politics& Public Policy30 Jul 2006 06:36 am

It has often been said that Irish is a language with little use, and no place in the new Ireland and should no longer be a cumpulsory subject taught in schools. This notion needs to be unclothed as a complicated psychological condition and not a well reasoned argument.

School is cumpulsory. Numerous subjects in school are cumpulsory. English is cumpulsory. Maths are cumpulsory. While it is clearly desirable to have students who are enthusiastically engaged with subjects such that they study them even in their spare time, it is unreasonble to assume that that state should support a public school curriculum in which no subjects are cumpulsory and a basic level of knowledge is unecessary in any subject. I’m sure that most reasonable people will agree that some cumpulsory subjects are necessary and
desirable for the education of the public.

This naturally leads as to ask which cumpulsory subjects these should
be? Bilingualism has enormous cognitive benefits as has been demonstrated by numerous studies. It is hard to overstate the pedagogic value of bilingualism. The positive effects bleed into other subjects including performance in mathematics, critical thinking and of course learning subsequent languages.

The indirect socialogical benefits are also numerous. Nothing can unmask subjectivity and perspective like the study of another language. Additional languages provide access to conversations that could never have been had otherwise. If the language has a literary tradition (as does Irish) they also give access to a body of literature that might otherwise be inaccessable.

While the strong Sapir-Whorf hypothesis is not widely believed, it would be hard to suggest that some form of the weak Sapir-Whorf hypothesis is untrue even for universalists (I’ll have to make this a whole other post sometime).

Bilingualism should be considered a minimal basic requirement of education. What should this baseline be? I contend that Irish is the natural choice.

Irish is already fairly well known by most people in Ireland. While competency in speaking the language is low, people I have met tend to know quite a lot more than a ‘copla focal’. It would require very little effort for many Irish to gain speaking competency.

It is often claimed, half in jest, that either Polish or Mandarin should be the cumpulsory language since more people in Ireland use them daily then use the Irish language. For true bilingualism to occur it is neccessary that at least one of the additional languages taught, be taught universally. It is imperative to expect that one can use the langauge, at least at a basic level, and be understood. It is unreasonable to think that the language that will have these qualities is either Mandarin or Polish. It can’t be possible to make either of
them the single baseline cumpulsory language in school.

While the teaching (and speaking) of these two languages should be encouraged they can not be a replacement for Irish. Irish itself is the language with the greatest likelyhood of generating the socialogical consesus necessary to be a universally spoken language on the Island.

Irish, with its poetic character and its extensive and historically important literature should be seen as one of the pillars of pedagogy in public education in Ireland.

I, as an immigrant, have little comprehension of the complex emotional relationship that natives of the island have with the Irish language, besides the obvious understanding that it is indeed very complex. I have witnessed more than once, the same person in the same conversation both arguing for, and against, the importance of saving the Irish language. With such internal conflict amoung many Irish people, it is no wonder that there is a public conflict concerning the language.

There are a couple of vectors that I think that may have contributed to this complicated situation. One of which the cumpulsory teaching of the Irish language. The fact that after 12 years of study of the language in school, only a tiny percentage of people are able to speak the language with any fluency has to have had a negative effect. I’ve often heard people say that the language is impossible to learn. This is far from true, but it may indeed seem that way due to the tremendously poor methods employed in the teaching of the language.

I have heard from many sources that Irish is taught as a dead language; without a focus on conversation, and forcing memorization without teaching rules of grammar. This is a confluence of the absolute *WORST* ways to teach a language. Is it any wonder that people who take 4 years of French in the Irish school system feel far more comfortable in French? There is a large body of study on the proper methodology for teaching languages to obtain fluency. These methods need to be employed and the old techniques should be removed IMMEDIATELY before they can inflict psychological damage on yet another generation.

The Irish government is currently cash rich, and of all possible uses of money, nothing has a greater benefit for society and the long term economic health of a country than proper education. It has a nearly incalculable benefit. With this in mind my program for obtaining true bilingualism on the island would be the following.

Money should be made available to all teachers and child minders to spend signifcant periods in the Gaeltacht in the interest in assuring a high standard of fluency for those that are responsible for the education of our children.

Money should be made available to entirely subsize child care given through the Irish language. This would remove the hurdle for those who are usually at the greatest economic disadvantage in obtaining the Irish language. The money spent on child care would serve the dual perpose of education and social welfare while not increasing costs to the state significantly for either goal.

Money should be made available for adult education in the language. It is more expensive to obtain Irish language instruction in the Republic than it is in Northern Ireland. RTE should give their Irish language materials away for free (charging only for the cost of media). The BBC has better free language learning materials for Irish than does any entity in the Republic. I find this a highly embarassing situation, and I hope that others do as well.

The teaching of Irish in school should be modernised with a two pronged approach focusing both on conversational Irish, and on grammar and the study of literature.

I think that if these policies were to be put in place we could see wide spread fluency in the langnuage and most importantly a bilingual island within a generation.

I don’t intend that the Irish people be forced to give up their characteristic melange of national pride and self deprecation. I would never suggest such a thing. However, this should not keep us from implementing the most effective education system that we can.

« Previous PageNext Page »