I’ve been a bit lame about logging stuff for the past few days (well, I was only really here on 12-21 and 12-29, to be fair; the rest was either vacation or very short days).
Either way, here’s what’s happened since 12-20:
-implemented cold restarts
-do a line search with partial re-evaluation of beam between old point and new point in L-BFGS (to avoid overfitting to the surrogate loss)
-printed utterances that become wrong after updates [although I haven’t used this extra output yet]
-wrote script to generate graphs to display learning curves
-printed separate train/test statistics
-printed out how much each example contributes to the gradient
-used this output to find a bug in my hashing
-currently waiting for new runs (without the hashing bug) to finish, which will take a couple hours
Oh and here are some learning curves (some of the info in the titles is inaccurate due to a bug in the line search that I had for a while that caused it to essentially not occur): aggregates
The “before” curve is training set accuracy before the surrogate loss (i.e. beam) gets updated, the “after” curve is training set accuracy after it gets updated, and “test” is test set accuracy.