(Note that I was on vacation from 01/07 to 01/13, then had a lot of work related to TAing CS229T from 01/14 to 01/18, and was on vacation again from 01/19 to 01/21. So I’ve only actually missed a few real work days of logs.)
What I did today:
I’m now starting a new project, working with Noah Goodman and Andreas Stuhlmüller. The goal of the project is to figure out when it is possible to avoid computing some portion of a program — instead replacing it by a default value — and obtain a result that is close to the “true” output of the program. (Note that the default value could be a probability distribution instead of a single number.) We call this “very lazy evaluation” because it is a more aggressive form of lazy evaluation that only asks for the output to be approximately the same instead of exactly the same.
Something that is intimately tied to this goal is the ability to approximately evaluate the impact of changing an input on the output of a program (we would ideally like to do this without actually running the entire program, because that may be expensive). Today I worked on something related to this latter goal.
Let’s forget about the structure of the program for a second and just assume that we have two collections of variables, and . We are actually dealing with randomized programs, and furthermore we allow for the possibility of conditioning on boolean functions of the output. Dealing with all of this is a bit complicated, but we can abstract all of this away to just say that we have some probability distribution over and with (unnormalized) probability density . Let’s assume further that and have some substructure such that we can write as . We can of course always do this by letting ; what we really want is for , , and to be “nice”, in a sense which we will define more formally in a bit.
The first thing I did today was to realize that the factorization of into , , and is a useful way to think of decomposing a collection of variables into two sets and .
The second thing I did was to think about the following question:
Suppose that what we really care about is the marginal distribution over (i.e. ), and that:
where denotes the indicator function for an event. Note that these equations say that and are piecewise constant and that only takes on values of or . This last constraint isn’t that unreasonable; it is basically asking that simply be a function that says whether a given and “fit together” (for instance, think of and being subtrees of a syntactic parse and checking whether combining and results in something grammatical). In practice this is actually a bit too stringent, but we’ll work with it for now.
The question I wanted to answer is how to compute (or approximate) efficiently given the above data. Algebraically, it is pretty easy to compute :
where . Importantly, this sum is also computationally easy to compute and represent as long as is a function of and ; or, in other words, takes on the same value for all . This is true in the following two cases:
- For a given and , either is contained in or it is disjoint from . Intuitively, this means that the and contain all information necessary to figure out whether and are compatible (for instance, in parsing, knowing the heads of the two subtrees is typically enough).
- A bit more generally, for any , the intersection of with has the same set of elements. This would correspond, for instance, to subtrees where the head is not known but partial information is known (for instance, we might know that the head of is a verb but the compatibility constraints require it to be a transitive verb; in this case, if was the set of all trees whose head was a verb, we would still be fine because we could just count how many of those verbs were transitive).
Note that both of the examples I gave were from parsing; that’s because that was the motivating problem I was thinking about while developing the above. The next step is to make sure that this intuition carries over to other problems, as well. In general, I think of the and as abstract representations of some set of possibilities, where we assign constant probability to every concretization of a given abstract representation.
Other things I did:
- hold CS229T office hours
- attend the NLP lunch
- think about linear programming duality with Roy (I still haven’t quite grokked this)
- publicize the Aaron Swartz Memorial Hackathon