What I did today:
1. Considered what happens if you have a polytope of distributions and take their marginal. It turns out you still get a polytope, although so far I’m not sure it has a nice representation. However, you can still at least formulate the maximum-entropy marginal distribution nicely by considering the program
subject to ,
where is the polytope and is the entropy. Since marginalization is linear, this program is tractable (and if is measurable with respect to a small -algebra, then the program will also be tractable).
2. I also tried to apply the polytopal perspective to the problem that Noah was interested in (considering the optimal set of variables to use when doing inference about a subset of variables in a graphical model). In this case I think I run into the same problems as when I condition (that the worst case is much too bad), although Carlos Guestrin seems to get away with worst-case analysis in one of his papers, so maybe this is okay.
3. Read through some more papers. Highlights:
-“Smooth Interpretation” [Chaudhuri and Solar-Lezama, 2010]. This is another cool paper that tries to analyze the output distribution of a program when the input is a Gaussian random variable. They use tools from program analysis to do so, and even use some of the pruning techniques that I’ve been thinking about; although they seem to not really prove any bounds, which makes me hopeful because I think I have ways to prove bounds in at least some cases. They reference a large literature on verification of probabilistic programs, which I should probably read.
-“Node Splitting: A Scheme for Generating Upper Bounds in Bayesian Networks” [Choi, Chavira, Darwich, 2007]. These guys come up with a pretty interesting idea for computing the maximum-probability assignment of a collection of random variable in a Bayes net (conditioned on some values). The idea is to split a node into two copies, essentially removing some dependencies, which gives one an optimization problem that is an upper bound on the original optimization problem; whenever the two copies are consistent with each other, the bound is exact, so they can basically do branch-and-bound to try to compute a provably optimal solution (by getting an upper bound where all the variables are consistent). The algorithm is exponential only in the number of splits they perform, so for networks that can be split into a small number of networks with low tree-width, this algorithm should work quite well.
4. I also revisited the relevance criterion from the Guestrin paper from yesterday. Basically, they compute some upper bound on the partial derivative between a message and the probability assigned to a target variable; they seem to upper bound the partial derivative across all possible assignments of all other messages in the network, which seems intuitively like it should give a very conservative estimate, so I’m a bit confused as to how his algorithm works well anyways.
5. Several meetings: interesting NLP lunch talk by Sonal, meeting with Percy and Sida about 229T, and meeting with Emma and Jacob T. before they leave for Spring break.
6. Made Qiaochu try to teach me LaSerre duality, although I’m still a bit confused about what’s going on.