SIST thought 1: likelihood and priors for model space

This was a thread on Twitter but I wanted to share it here for a different audience.

Today’s thought (actually 24 June but here you get free shipping) from reading Deborah Mayo’s book SIST.

The likelihood principle (H1 has more evidence than H2 iff P(x|H1)>P(x|H2)) allows comparisons from pre-specified hypotheses, but not inference over the whole space of possible hypotheses and explanations.

You can restrict the model and get parameter values that maximise likelihood, and that could be fine. But over the models, it gets much harder, and often , that’s what we have to do. You can maximise likelihood and get a massively over-fitted complex model.

Then, what can we do about the bias-variance tradeoff?
(1) we introduce humans doing a variety of “razor” judgements, but with cognitive biases
(2) we do cross-validation, but that needs lots of data
(3) we do penalized likelihood but that operates on restricted regions of model space
(4) we do some greedy ML algorithm like #DeepLearning, but there’s no guarantee of a global maximum
(5) we (try to) include priors for our model preferences in a #Bayesian model

There’s also (1a), where we have the humans but they are constrained to work within a pre-specified analysis plan of the sort that exceeds what is usually written for this kind of non-experimental work. That could be good, though hard work.

I like (5) (who’da seen that coming?) and I haven’t thought about this in a systematic way before. Thanks Prof Mayo! I don’t often get new insights in data analysis nowadays.

Regularizing priors encode our preferences for parsimony. Gaussian process priors can encode our preference for smoothness. What else is out there? Please let me know if there are related concepts for expressing preferences in model space via priors or Bayesian algorithms.

I’d also like to note the link to Peter Lipton‘s suggestion of Bayesian priors for loveliness of explanation, which is not a developed idea and might never be, but aims at the same goal as this.



    1. I think there are some ways in which machine learning people might search over models (rather than just parameters) in a way that does not give equal weight to every possibility, on the basis of prior knowledge or opinion, and I think that is a kind of prior, but because the link to the posterior probability is not evident, it’s hard to call it that. Certainly, there’s a lot of work on Bayesian neural networks at present that might change the scene.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s