Oops looks like I was lame again. One excuse is I was sick for a bit. Anyways, here’s what I did since last time:
1. Realized that the loss function was converging to a crappy local optimum, modified the algorithm using an “EM-style” trick to make each local step at least be convex. This substantially improved the learning curves.
2. Examined individual test cases that we were getting wrong (based on how much they contributed to the gradient). Realized that part of the problem is that our model was unable to express certain sentences, such as “people in boulder”, that required a single word to lexicalize to multiple predicates.
3. Realized that my cost function was completely flat (i.e. Hessian = 0, gradient = 0) in some regimes and that this was causing L-BFGS to get stuck and not move (but only due to being on a plateau rather than the actual global optimum). I fixed this and things improved by a lot.
4. Also switched to an unconstrained version of L-BFGS that is actually truly convex (unlike the previous one which only had convex level sets), and surprisingly found that this didn’t mess up the results much (as opposed to earlier, when some variables would become very large if left unconstrained and mess up the model).
5. Removed a bunch of examples that the current model is unable to get to investigate whether this improves things.
6. Augmented the model to be able to get a substantial portion of those examples, then re-ran the algorithm on the full training set; I now get 65% training-set accuracy and 50% hold-out accuracy, which is a substantial improvement from before! I realized there were still a few issues with what I was doing, so I am now re-running with a slightly different model and hoping that I get even higher accuracy. After that I will examine what I am still getting wrong and see what the right next step is.
In the meantime, I’ll also be doing a bit of prep work for CS299T, the class that I’m TA’ing next quarter.