This was a thread on Twitter but I wanted to share it here for a different audience.
Today’s thought (actually 24 June but here you get free shipping) from reading Deborah Mayo’s book SIST.
The likelihood principle (H1 has more evidence than H2 iff P(x|H1)>P(x|H2)) allows comparisons from pre-specified hypotheses, but not inference over the whole space of possible hypotheses and explanations.
You can restrict the model and get parameter values that maximise likelihood, and that could be fine. But over the models, it gets much harder, and often , that’s what we have to do. You can maximise likelihood and get a massively over-fitted complex model.
Then, what can we do about the bias-variance tradeoff?
(1) we introduce humans doing a variety of “razor” judgements, but with cognitive biases
(2) we do cross-validation, but that needs lots of data
(3) we do penalized likelihood but that operates on restricted regions of model space
(4) we do some greedy ML algorithm like #DeepLearning, but there’s no guarantee of a global maximum
(5) we (try to) include priors for our model preferences in a #Bayesian model
There’s also (1a), where we have the humans but they are constrained to work within a pre-specified analysis plan of the sort that exceeds what is usually written for this kind of non-experimental work. That could be good, though hard work.
I like (5) (who’da seen that coming?) and I haven’t thought about this in a systematic way before. Thanks Prof Mayo! I don’t often get new insights in data analysis nowadays.
Regularizing priors encode our preferences for parsimony. Gaussian process priors can encode our preference for smoothness. What else is out there? Please let me know if there are related concepts for expressing preferences in model space via priors or Bayesian algorithms.
I’d also like to note the link to Peter Lipton‘s suggestion of Bayesian priors for loveliness of explanation, which is not a developed idea and might never be, but aims at the same goal as this.