# Log 04-03-2013 to 04-07-2013

This week I went to a workshop on mathematical logic to work on an extension of this problem, which basically aims to define a probabilistic version of first order logic that gets around Godel’s incompleteness theorem (we aren’t quite there yet, but we did get around Tarki’s undefinability of truth).

The slightly more practical goal is to define a notion of “logical uncertainty”, where we can in a principled way assign probabilities to statements that are a logical consequence of our current information, but may be computationally intractable to deduce.

While at the workshop I attempted to solve several problems along these lines, although I don’t have any complete solutions to any of them:

1. Suppose that we have a known, one-to-one function $f : X \to Y$ and are trying to learn a distribution $\latex \pi$ over X. We observe $f(x)$ for several independent samples $x \sim \pi$. We can’t efficiently invert $\pi$, but we have a “heuristic” $g : Y \to 2^X$ such that $x \in g(f(x))$. How should we use $g$ to learn $\pi$?

If we let $S = g(y)$, then the naieve answer of conditioning on $x \in S$ doesn’t work, because we really need to condition on the heuristic g returning the answer S; in particular, imagine that S is always either a one-element set or all of X (e.g., S either successfully inverts f or fails to). Then conditioning on x lying in S would cause us to become increasingly convinced that $\pi$ only places mass on elements of x where f is easy to invert, which is wrong. So the question is, what (computationally efficient) thing can we do instead?

Note that this is somewhat related to the difference between logic and probability — in logic, if we have a proof, it doesn’t matter how we came across it, the proof is valid; but in probability, we actually have to take into account how we obtained our evidence, or we might end up with biased results.

2. If we have $k$ (possibly dependent) predictors, we would intuitively like to predict with the most confident of them at any given point in time; but unfortunately, this can be seen to have bad properties; so is there something like this that we can do instead?

3. In the paper linked to above, we obtained a probability operator $P : L \to [0,1]$ that assigns probabilities to statements in a formal language and has the properties that $P(a) = P(a \ and \ b) + P(a \ and \ not(b))$, plus associativity, commutativity, and assigning probability $1$ to tautologies and $0$ to contradictions. It turns out that any such probability operator can actually be represented in terms of a distribution over models, where $P(a)$ is the mass of models in which $a$ is true.

The question is: can we obtain $P$ as the Nash equilibrium of some game?