Participants walked and used tech, each in their own way but given some inspiration by Hunter at the outset and by visitors like Valentina d’Efilippo in workshops, to collect data on their environs. Light, noise, images, pollution all featured in continuous data collection, as well as concepts that have to be counted by humans, such as numbers of people, signage, security and surveillance infrastructure

The book says (p.12) that the aim is “to expose designers or any participants to data gathering processes …” and so the tech has to be accessible in terms of skills and know how. Coding and PCs / laptops are kept to a minimum and admitted grudgingly.

Everyone should do this sort of thing. In particular, collecting data in the real world, with all the problems, compromises and leaps of faith that entails, is a valuable part of learning about analysis and quantitative thinking. It should be part of all stats teaching at an early stage. It should be in high school maths curricula too. *cf* “You Should Get Out More“.

It is an opportunity for participants to learn about data literacy, understand the fallibility of sensors, technology and processes, the ambiguity of results, and principles like ‘correlation does not mean causation’, in the hope they are better equipped to deal with data in other aspects of their lives.

When I say they used “tech”, the project was all about affordable DIY tools like Arduino. There is an intersection between creativity, collection as education, and citizen science here.

It might also be the case that certain types of data are not available for a specific aspect or place, or in high enough fidelity. In which case, it is time to roll up our sleeves and gather the data ourselves.

[

cf“We Were There When They Made Dear Data” and “Explanation And Inference With House Sparrows“]

The second half of the book collects the participants’ impressive and varied outputs. Each participant chose how to create the output from their investigation, using all that nice kit that one finds in an art school: laser cutters, 3D printers, robotic routers, embroidery machines, and so on. But there were also hands-on processes that do collection and output together, like taking charcoal rubbings of pavements. It seems important to me to spend some time as lo-tech as possible, so you counteract the urge to be drawn deeper and deeper into tinkering with the tech to get it to do exactly what you had in mind.

Personally, I’d like to see more lo-tech, hands-on versions of stats / data science outputs. If we made flip books (p. 73) rather than GIFs of animated graphs, we might have more impact, as well as learning from the unavoidable confrontation with choices in curating data into experience.

It’s also vital to talk through what choices you are making with others in the same position:

]]>Workshops have been a huge positive of this project as a framework for quickly learning new skills, evaluating techniques, gathering snapshots of data and prototyping visualisations. […] Initially workshops were technology driven but have evolved to explore other methods, moving from quantitative to qualitative techniques and analysis, gaining insight by examining the characteristics and nuances of collections, recognising the importance of discussion in learning and raising issues like data and visual literacy.

It’s not just about economics but also the connections to statistics, machine learning, and other fields of application where we can all learn new tricks from one another (I can tell you that my invited speaker will be talking about Approximate Bayesian Computation…), and along the way, enjoy some Italian hospitality. Come and join us! And also, submit an abstract! Here’s the details:

The workshop will be held in

Socio-economic systems evolve not only over time, but largely over space. Thus, a proper understanding of social systems’ dynamics, evolution, and sustainability requires locating individual behaviors and their interactions within the space dimension.

Spatially referenced and big space-time data are now increasingly available, thus posing new opportunities for understanding the functioning of social and natural phenomena, and widening the information set for fine-tuning policy actions towards societal Grand Challenges.

The size, complexity, and multifaceted nature of time-space data require new methods and techniques able to learn from data and provide policy makers with clearer policy guidelines.

Reducing data complexity to manageable information, improving data visualization for detecting unpredictable patterns, generalizing causal inference when spatial interactions are pervasive, are just some of the many instances arising from the current intersection between the technology of data collection, the development of advanced computational approaches, and the quest for more informative policy guidance.

IWcee19 aims at exploring these intriguing subjects, by opening the opportunity for interested scientists, practitioners and policy makers to gather, discuss, and present solutions to current and foreseeable new issues related to spatial economics and econometrics.

Papers on the following topics are highly welcome:

o Spatial economics and econometrics

o Regional and urban studies for sustainable growth

o Space-time statistics and geo-statistics

o Mapping and spatial data visualization

o Spatial spillovers and interactions

o New geo-referenced spatial data sources, including those from the social media and the Internet

o Causal statistical modeling when space matters

o Agent-based modelling in spatial contexts

Nonetheless, papers on more general economics and econometrics topics will be considered.

IWcee19 will publish the best papers in a Special Issue of “**The International Journal of Computational Economics and Econometrics**” (**IJCEE**) (http://www.inderscience.com/ijcee). IJCEE is indexed in: Scopus (Elsevier), Emerging Sources Citation Index (Clarivate Analytics), Chartered Association of Business Schools (CABS) Academic Journal Guide, Repec

Please submit your extended abstract (max 1000 words) at: iwcee@ircres.cnr.it

If you have any query or curiosity, please, contact:

Marco De Biase, iwcee@ircres.cnr.it

You can find the

Thank you for your kind attention.

SCIENTIFIC COMMITTEE

Getting out from behind the computer and talking to people could be the number one most valuable skill to add to a data science person’s armoury. Anyone can do code, any fool can install trendy R/Python packages and press Go.

Here’s an anecdote from *Past, Present and Future of Statistical Science* (which I previously reviewed in full here):

Dennis Cook described (p. 98) a yearly cycle of experimental design, field work, data collection and then, only then, analysis. He was involved in every step.

Starting in the late winter, we would prepare the fertilizer combinations to be tested … and lay out the experimental designs on paper. … plots would be planted in the spring … and tended throughout the summer … harvested in the fall, followed by threshing and weighing the wheat. Most of the winter was spent constructing analysis of variance tables with the aid of large desktop Monroe calculators and drawing conclusions prior to the next cycle of experimentation.

I’m sure there is something to be said for this experience, especially early in one’s career. He later (p. 106) got involved in developing capture-recapture methods by dangling from a helicopter and shooting paintballs at wild horses. Now that’s something most statisticians don’t get to try out very often.

Serendipity, the happy accident, subconscious problem-solving all feature here too. Many a theoretical or modelling problem in my career has been solved by putting my boots on going out in the country for a few hours.

A parting tweet for you to ponder:

]]>I designed this taster webinar because there are many ways to get in-depth, technical knowledge of the subject, but that is a commitment of time and money. I think there are lots of people out there who want to just learn a little more first, before taking that decision. If you are managing a data science team, or considering Bayesian methods as a new skill, this probably means you!

The Bayes Taster costs a mere 5 pounds. Like, a coffee and a slice of cake. Go on, you will, you will.

Then, I am going to give three taster webinars looking at particular software options. I’m not considering point ‘n’ click interfaces or preset commands inside other software, even though some of these are pretty good. I’m looking at the serious options — probabilistic programing languages — which let you code up your own model in your own way. They are harder to learn, but they keep on being useful, however complex your analysis gets. All of these are open-source and free.

First up is a one-hour taster looking at BUGS and JAGS: the original probabilistic programming languages, or at least model scripting languages. They’re going strong, and a lot of people are still happily using WinBUGS as a standalone package. There are interfaces from R, Python and even Stata. That’s on 1 March and costs £30.

On 4 March, you get another one-hour taster on Stan, a newer probabilistic programming language that uses an improved algorithm and solves a load of BUGS/JAGS problems. However, it doesn’t beat BUGS/JAGS at everything, and I’ll look at that too.

Finally, on 8 March, another one-hour taster looking at PyMC3, a Python package for Bayesian inference that contains the best bits of the others we’ve looked at so far.

You can also get a ticket to all of them for £65, which basically means you get one of the software webinars for free. At the end of this series, you’ll know how you could be using Bayesian modelling, how difficult it might be in your context, and what software option you would be most comfortable with.

And, while you’re here, there are more BayesCamp sessions coming up on things like data visualisation, Stata, clinical audit, meta-analysis… take a look at the courses page on the website.

]]>

Each webinar builds on a topic from my book. In each, there will be:

- some demonstration of thinking through, sketching the options and critiquing them
- some time set aside for you, the participants, to try something out on paper and then present back / discuss with the others
- some time at the end for Q&A

They take place 1500-1600 UK time, will be delivered through GoToWebinar, and cost GBP 50 each, or GBP 300 for a block booking (get one free). There will be recordings available for participants afterwards. If you want to ask any questions about these or other BayesCamp courses, email me at robert@bayescamp.com

]]>

Special mention for Gwilym Lockwood:

**Methodviz** of the year goes to John Zech for “What are radiological deep learning models actually learning,” an investigation into what one particular deep learning model did when it was charged with predicting respiratory diseases by looking at chest X-rays. If you are excited about deep learning, AI and such in the medical world, * stop!* and read this. There’s a helluva punch line; I’ll leave it to John.

Here are three approaches to showing uncertainty around a single curve in two-dimensional space.

There are two variables, and implicitly a third, which might be time; the curve moves from bottom left (where, initially, there is no uncertainty) to top right (and the uncertainty increases along the way). The data for this image are artificial, but I was thinking of hurricane tracks as well as any time series forecast where uncertainty is so important and increases further into the future.

The three approaches are, in essence:

- Show one shape with an X% chance that the “truth” will turn out to lie outside it. By “truth”, I mean a population parameter, if you are doing inferential statistics, or future data if you are doing prediction. It could be something else too, like a rank, model, or missing value, but the same principle applies. Take this idea of the chance of the truth lying in some location and imagine it as a surface, rising where the chance is high and dropping where it is low. This surface rises up to the summit, which is our best guess. We could add the best guess to the image, which is the central line in this image, but could also be a point. Then, our shape is a contour line: the surface is at the same height all around that contour. By the way, that surface is something that we really do deal with statistically. It is a posterior probability density function of you are doing Bayesian statistics, or a likelihood function otherwise. But there are ways of getting these uncertainty measures without full-on likelihood or Bayes. The bootstrap gives you this by just picking the central X% of resampled statistics. Inferential shortcut (asymptotic) formulas sometimes work on approximations. Approximate Bayesian Computation (ABC) generates phony data according to different parameter values, and compares it to what you actually observed.
- Show several of these contours, like a topographic map. You might prefer to colour in the area according to the height of the surface instead, in which case it might be clearer to do some kind of smoothing; I’ll have another blog post soon on the subject of smoothing in data visualisation.
- Instead of showing the values (height) of the probability / likelihood, draw values at random from that function and show them. If you are bootstrapping or doing Bayesian stats by simulation, then this is simple because you can just draw the bootstrap stats or the retained simulations. The posterior / likelihood then acts as a
*data generating process*, a crucially important mental tool for good data analysis, but something we’ll have to leave for another time.

If you are showing one value at a time, the classic error bar is the result of option 1. Tukey and others have proposed versions with multiple levels of uncertainty, relating to option 2; you could try gradations of colour or line width too. Option 3 would involve a scatter of dots, preferably semi-transparent and/or jittered.

More exotic tweaks to this general idea are also out there — I included examples of a Bank of England fan chart and a funnel plot for comparing clusters of data (hospital mortality, in my example) in the book — and if something like that is the accepted, understood and expected approach in your line of work, then you should go with it. I had to choose what to include to keep the book from getting too long and expensive, and some fun approaches to uncertainty got, unfortunately, spiked, such as visually weighted regression.

Why hurricanes? Lots of interesting dataviz work has been done on them (at least, American hurricanes, because that’s where the dataviz muscle is) in recent years by journalists. Most recently, Alberto Cairo has led an effort to improve them. He says that option 1 from above is poorly understood and introduces a false dichotomy: if you live inside the cone, you’re gonna get whacked, and if you live outside, you’re totally safe. Also, people mistake the size of the cone for the size of the hurricane itself. Option 2 helps a bit, but not totally. Option 3 is good but in some settings (such as weather forecasting), not all the lines carry the same weight (some forecast models are known to be more reliable and sophisticated than others) — how do you show that?

When you are choosing how to visualise uncertainty, there are some important considerations. Here are some that come to mind:

- What is the statistical literacy of your audience? If it’s a mix, you probably need more than one image. Provide something they know how to use, rather than something you’re convinced they’ll love once they’ve learned how to use it (more Bill Gates than Steve Jobs).
- What summary statistic is of interest, if you are doing inferential statistics? Not the statistic you can easily get, or the one with a handy formula for standard errors, but the one your audience needs for decision making.
- If you are going to show contours, error bars, or some other depiction of a given level of (un)certainty at X%, find out what X is meaningful to your audience. For example, if it is a business decision that depends on your information, then ask what level of uncertainty (risk of being wrong) would change the decision, then draw that level.
- Is sampling error (having a sample, not the whole population) the only source of uncertainty? If not, if your estimates are also affected by things like missing data, confounding / endogeneity, or response bias, then consider a Bayesian approach, where you can incorporate all those sources of uncertainty into one posterior probability surface.
- Is it enough to see the uncertainty around each estimate / statistic / prediction on its own, or does your audience need to see how they interact? Sometimes, over-estimating A implies under-estimating B, and in these cases, you need to think about not just the variance (spread) of the uncertainty of A and B individually, but also the covariance between them.

- Is the uncertainty likely to be asymmetric? Imagine you are estimating a small percentage. You shouldn’t use a shortcut formula that will return an interval extending into negative values: you will get laughed at. In cases like these, you can sometimes transform the data / stats before calculation, to induce asymmetry, or you could swap over to the bootstrap.
- What if you are worried that some outlier or clustering in the data is going to spoil the shortcut formula? You can switch to using a formula robust to outliers, like the Huber-White sandwich estimator, or robust to clusters, like the Huber-Rogers clustered sandwich estimator. (Feeling hungry?)
- Consider whether interactivity or animation could help your audience to understand how uncertainty could have come about, and how it could be affecting your results.

It’s great to be able to tailor to the audience by giving them a range of images, assumptions, models, etc, to mull over. Moritz Stefaner argued for *worlds, not stories*, and he is right to do so. I want to sound a note of caution though. This is an old statistical concept, multiplicity. You might know it as researcher degrees of freedom or the garden of forking paths.

By letting the audience explore and draw their own conclusions, you co-opt them as fellow researchers, and they become capable of the sins of researchers, especially as they have had no training in how to go about their investigations. If they keep looking long enough, and making enough comparisons, they will find something that looks like it is outside the bounds of uncertainty: two error bars that don’t overlap, or a hospital outside the funnel, or a brief period where a time series departs from the predicted fan of uncertainty. These aberrations could give an important insight, or they could just be noise; the more you look, the more likely you are to spot a pattern, but there’s no guarantee that it is “true”. In fact, the risk of error generally goes up as you keep looking.

So, the X% you set for your dataviz becomes corrupted, depending on how hard the reader looks at it. There is no very clear answer on how to tackle this; you just have to help the reader learn how to read your dataviz. And that brings me to the final point, the exemplarium. This is where you lead the reader into the visual package that you created by talking them through a single example. It happens in US Gun Deaths, and in many of the images in *London: the Information Capital*. This is how you give the reader a key or legend, plus advice on how not to get carried away, plus what uncertainty is in this context. I think it’s the only way to give an inroads when dataviz gets complex, and without swamping the reader.

One of the most common concerns that I hear from dataviz people is that they need to visualise not just a best estimate about the behaviour of their data, but also the uncertainty around that estimate. Sometimes, the estimate is a statistic, like the risk of side effects of a particular drug, or the percentage of voters intending to back a given candidate. Sometimes, it is a prediction of future data, and sometimes it is a more esoteric parameter in a statistical model. The objective is always the same: if they just show a best estimate, some readers may conclude that it is known with 100% certainty, and generally that’s not the case.

I want to describe a very simple and flexible technique for quantifying uncertainty called the bootstrap. This tries to tackle the problem that your data are often just a sample from a bigger population, and so that sample could yield an under- or over-estimate just by chance. We can’t tell if the sample’s estimate is off the true value, because we don’t know the true value, but (and I found this incredible when I first learnt it) statistical theory allows us to work out how likely we are to be off by a certain distance. That lets us put bounds on the uncertainty.

Now, it is worth saying here, before we go on, that this is not the only type of uncertainty you might come across. The poll of voters is uncertain because you didn’t ask every voter, just a sample, and we can quantify that as I’m describing here, but it’s also likely to be uncertain because the voters who agreed to answer your questions are not like the ones who did not agree. That latter source of uncertainty calls for other methods.

The underlying task is to work out what the estimates would look like if you had obtained a different sample from the same population. Sometimes, there are mathematical shortcut formulas that give you this — the familiar standard error, for example — immediately, by just plugging the right stats into a formula. But, there are some difficulties. For one, the datavizzer needs to know about these formulas, which one applies to their purposes, and to be confident in obtaining them from some analytical software or programming them. The second problem is that these formulas are sometimes approximations, which might be fine or might be off, and it takes experience and skill to know the difference. The third is that there are several useful stats, like the median, for which no decent shortcut formula exists, only rough approximations. The fourth problem is that shortcut formulas (I take this term from the Locks) mask the thought process and logic behind quantifying uncertainty, while the bootstrap opens it up to examination and critical thought.

The American Statistical Association’s GAISE guidelines for teaching stats now recommend starting with the bootstrap and related methods before you bring in shortcut formulas. So, if you didn’t study stats, yet want to visualise uncertainty from sampling, read on.

If you do dataviz, and you come from a non-statistical background, you will probably find bootstrapping useful. Here it is in a nutshell. If we had lots of samples (of the same size, picked the same way) from the same population, then it would be simple. We could get an estimate from each of the samples and look at how variable those estimates are. Of course, that would also be pointless because we could just put all the samples together to make a megasample. Real life isn’t like that. The next best thing to having another sample from the same population is having a pseudo-sample by picking from our existing data. Say you have 100 observations in your sample. Pick one at random, record it, and put it back — repeat one hundred times. Some observations will get picked more than once, some not at all. You will have a new sample that behaves like it came from the whole population.

Sounds too easy to be true, huh? Most people think that when they first hear about it. Yet its mathematical behaviour was established back in 1979 by Brad Efron.

Now, if you work out the estimate of interest from that pseudo-sample, and do this a lot (as the computer’s doing it for you, no sweat, you can generate 1000 pseudo-samples and their estimates of interest). Look at the distribution of those bootstrap estimates. The average of them should be similar to your original estimate, but you can shift them up or down to match (a *bias-corrected* bootstrap). How far away from the original do they stretch? Suppose you pick the central 95% of the bootstrap estimates; that gives you a 95% bootstrap confidence interval. You can draw that as an error bar, or an ellipse, or a shaded region around a line. Or, you could draw the bootstrap estimates themselves, all 1000 of them, and just make them very faint and semi-transparent. There are other, more experimental approaches too.

You can apply the bootstrap to a lot of different statistics and a lot of different data, but use some common sense. If you are interested in the maximum value in a population, then your sample is always going to be a poor estimate. Bootstrapping will not help; it will just reproduce the highest few values in your sample. If your data are very unrepresentative of the population for some reason, bootstrapping won’t help. If you only have a handful of observations, bootstrapping isn’t going to fill in more details than you already have. But, in that way, it can be more honest than the shortcut formulas.

If you want to read more about bootstrapping, you’ll need some algebra at the ready. There are two key books, one by bootstrap-meister Brad Efron with Stanford colleague Rob Tibshirani, and the other by Davison and Hinkley. They are pretty similar for accessibility. I own a copy of Davison and Hinkley, for what it’s worth.

You could do bootstrapping in pretty much any software you like, as long as you know how to pick one observation out of your data at random. You could do it in a spreadsheet, though you should be aware of the heightened risk of programming errors. I wrote a simple R function for bootstraps a while back, for my students when I was teaching intro stats at St George’s Medical School & Kingston Uni. If you use R, check that out.

]]>

Firstly, the ** Bayesian Taster** webinar. This lasts for one hour and costs only ten of those British pounds (currently, 13 USD or 11 EUR). There’s no maths and no coding; this is for complete beginners and is all about common sense. If you get the fundamental concepts right, you won’t get tripped up as it gets more complex later on. We will think about defining analytical problems in probability terms, what probability can be used for, and how practically to go about getting answers (fitting probability models to your data). We’ll look at a range of real-life problems with data and models that are too hard with old-fashioned stats / machine learning, but readily solved with Bayes. I’ll describe the spectrum of available software. This happens on 12 October, around lunch time for Africa and Europe, then later around lunch time for eastern Americas, or breakfast time for western Americas. You can book here (Afro-Euro edition) or here (Americas edition).

Secondly, a half-day online workshop called * Packages for Bayesian Analysis in R*, which is ideal for anyone with some R familiarity, who knows in essence what Bayesian analysis is about, but wants to find out about the options for actually doing it. We will look at a range of packages, from those that are easier to learn but restrict you to a collection of preset models, through to probabilistic programming that allows full flexibility. There will be plenty of mini exercises for you to try out on your own computers to get a feel for it as we go along. This will happen on 26 October, 1300–1700 UK time. You can book for this workshop here.

When I decided to start my own business doing training in Bayesian methods, I read a lot of other people’s introductions to the subject. I wanted to see how others approached the subject, and I wanted to steal the best ideas. I looked at books, videos, websites, blogs… and I’m still going, because they keep coming out and some are buried away in obscure places. Although there are some absolutely outstanding exemplars, the very beginning never quite satisfies me. I mean, the way that the idea of Bayesian statistics is introduced to the reader / listener / whatever.

In this post, I’ll set out what I like and don’t like about introductions to Bayes, and I’ll explain how I do it as I go along.

First, I have to be clear about my intended audience; teaching a room of doctors would be different to a room of maths grad students. Not necessarily better or worse, easier or harder, just different. I aim at people who think about problems quantitatively, but not mathematicians or statisticians. I want to help everyone else who falls between the cracks. They might be healthcare professionals, marketing analysts or machine learning folk who want to get stronger at stats.

So, for starters, my introduction is not very mathematical. It’s not that I don’t appreciate the importance of mathematical ability if you want to be a theoretical statistician, it’s just that my audience don’t intend to become statisticians. Plenty of visual aids help here, and I think that the flipchart or whiteboard is a much more useful tool than slides, because it is interactive and allows students to come up and try out things (for instance, after a small group activity). I like to prepare several pages on the flipchart ahead of time so we can just skip through from one concept to another while it’s fresh in their minds without them being distracted, thinking stuff like, “I wonder if that pen is going to hold out… the ink is looking thin.” An example of this is showing a regression line in variable space (X on the horizontal axis and Y on the vertical), then flipping to parameter space (beta1 on the horizontal and beta0 on the vertical) to show it as a point.

Later, when we get into software, there is huge value in demonstrating how to code something up and look at the results via a projector. A Jupyter notebook is a great way of doing this because you can quickly go back and tweak something and see its effect, although I feel uneasy about getting my learners to spend time gaining familiarity with a tool that few of them will use in earnest. It’s not considered cool, but I still think WinBUGS is a neat way of walking through reading in the data, checking the model, running a few hundred warmup iterations, then going for it. Of course, this is another tool that learners probably won’t use in years to come, but there’s no reason why you can’t do that same sequence of steps in R+Stan, for example.

Bayes theorem at the outset. Why do people do this? It shows us two things, the historical connection (who cares?) and the principle of reversing conditional probability by multiplying likelihood and prior. But that’s not what we do in practice, we simulate, so why not show students that. They can also work out quickly that the formula you would really use is not that simple; there’s also a normalising constant and a denominator that has to be integrated. It also confuses them that this theorem can be used for reversing-probability reasons that are not “Bayesian”, like the classic examples with medical test results. I prefer to introduce the value and flexibility of multiplying conditional probabilities together with a practical example and an explanation more like a particle filter (although I wouldn’t use that term), because that’s closer to the simulation that follows.

Philosophical distinctions about the meaning of probability or randomness (but see below). This is important for clever students but even then will only interest them once they are getting comfortable with the analysis. We should make our learners good at applying the methods first, then they can reflect on theory and finally history.

History. Nobody gives a monkey’s whether Jaynes and Keynes were the inspiration for a Gilbert and Sullivan operetta, or Peirce abducted Neyman’s cat.

Contrived examples. Oh, I like old, well-worn datasets, like irises or the Titanic (more on that another time), but tossing a coin ten times? Come on. (As a self-defensive footnote, I have used coin-tossing with success when introducing the concept of hypothesis testing, but that’s a very different goal and hence a different metaphor for the mental processes and quantification at work. I got that coin-tossing exercise from Beth Chance and Nathan Tintle at ICOTS9.)

A lot of maths: theorem-proof-lemma format for example, or matrix algebra when the learners would get the idea faster from talking about what happens to one observation, one parameter, one iteration at a time. Mathematicians get a habit to set out the most general possible exposition at the beginning, in the most general possible terms. But you can’t fully grasp it until later, when the fine details have sunk in. I think it’s better to have carefully chosen examples that illustrate one principle at a time, then gradually accumulate them. Debugging where someone lost the thread is much simpler. And we don’t need to see the proof unless we are studying how to prove similar things in future. You, the teacher, need to do the maths, but keep it out of sight.

Analytical solutions and conjugacy. Yawn. It’s not the 50s. Don’t waste your learners’ time.

1-d density plots of prior, likelihood and posterior that hardly overlap. You all know the sort. They are mathematically correct but unrealistic. That prior is a BAD prior, and your students can see that. The likelihood is out in low-prior region and that should give you pause for thought in real life. Don’t drag your students out of real life critical thinking and into some abstract ritual!

Simulation at the outset, shortcut formulas later (a la GAISE). Note that calling asymptotics “shortcut formulas” gives students the right attitude to conceptualising their analyses in a grounded and critical way; it’s not intended to disparage the value of solid statistical theory.

Prior and posterior predictive checking, where you use your model before it sees any data, and after it has “learnt” from the data, to generate new phony data. Take a look at the phony data and see if they look anything like the real ones. Where the prior does not include the data, you’ve got problems. Likewise, when the posterior does n’t look like the data in some way. These are intuitive ways of doing an open-ended check on your model.

A focus on computation, even if it’s not specific to Bayes (floating point accuracy, digital rounding error, or setting RNG seeds are all good examples).

Approximate Bayesian Computation (ABC) as an inroads to thinking about: (1) simulation as a way of combining probability densities and/or likelihoods, (2) letting the computer try different values of the parameter and seeing how it matches the data (3) the need for a sensible prior to guide the computer away from no-hope regions as well as problems like no overlap in logistic regression. However, this will work well for people who have already thought about random number generators, less well for those who haven’t. Because we are going to simulate, we need to introduce RNGs and distributions anyway, and throwing ABC on top of that might just be too much.

x ~ norm(10,3) notation. Rasmus Bååth and I both like this for introductory teaching and neither of us know what to call it. Let’s go with “tilde probability notation”. It’s much easier to write once you know a few common distributions, and you can pile them up thus:

mu ~ unif(0.5, 10)

y[i] ~ poisson(mu)

That’s a little univariate Poisson model. But it easily extends into models that are quite painful to read in algebra.

mu ~ unif(0.5, 10)

sigma ~ unif(0.1, 5)

y[i] ~ poisson(mu)

x_mean[i] = ((y[i] > 5) * 5) + ((y[i] <= 5) * y[i])

x_measure_error[i] = ((y[i] > 5) * 0.001) + ((y[i] <= 5) * sigma)

x[i] ~ round(norm(x_mean[i], x_measure_error[i]))

That’s a model for Poisson-distributed data that are censored and heaped at 5 and also have some measurement error below that point. Pretty advanced, pretty quickly. Also, once you get familiarity with this, you can use it in Stan or BUGS or JAGS.

Emphasising the communication advantage, for example, “our analysis shows that there’s an 81% chance that return on investment will be over $1m within 5 years” (check out Frank Harrell’s blog for some medical examples). Who doesn’t love that? Perverts, that’s who. Or ultra-frequentists, though I’m not sure they exist any more. So that just leaves perverts.

Emphasising flexibility — we can get beyond simple models quickly and without a lot of jiggery-pokery like frequentists, who have to juggle with REML and E-M and profile likelihoods and goodness knows what all.

Grounding everything we demonstrate in real-life research needs, like the communication and flexibility above.

Showing data space and parameter space and flipping between them

Getting quickly into models that are complex enough to be of real-life value. Students know when they are being shown some dumbed-down stuff that they could never use in vocational settings. This is a challenge of course, but you’re not being paid to dodge challenges.

Emphasising ways of thinking about data, models, truth, etc (a la McElreath). It’s extremenly important, to be sure, but I don’t know if it helps to hear it early on. I am too immersed in the subject to be able to judge.

Bayes as the only true approach to probability for adherents of religions who contend that everything is predetermined or decided by supernatural force (this would include most Muslims and Calvinists). In essence, if mortal humans know nothing for certain, including the immutability of parameters or the infinite replicability of your experiments, then it follows that data, latent variables, parameters and hyperparameters all move in ways you cannot fully understand, and so are subject to the same mathematics of probability. Is this a helpful assertion? Or not. I tend to think it best not to get involved in such matters, especially as I don’t believe it.

Emphasising networks early on, like David Barber’s (otherwise great) book does. I suppose some people work with those models and that decision-theoretic application, and need it. I just don’t know how it fits into everyone else’s learning curve.

Now that I am free to design my own educational products in Bayes, I find that actually, it always has to be tailored to the audience, unless it’s a quick overview. So, I set myself up to provide face-to-face training and coaching. I might have the odd quick overview as an online course, but the really in-depth stuff has to involve discussion, reflection and interaction, not just with me but with the other learners too. Of course, that means it’s not fair on people in far-flung places, but I can’t reach everyone.

I do training (a group of learners with clear learning outcomes at the outset) and coaching (one person and me, talking about their career and goals, where I mostly ask questions and there are no outcomes at the outset).

I think any training session should avoid getting bogged down in long sessions of chalk ‘n’ talk, but inevitably there has to be some of that. So, I keep them to 30 minutes long if I can, and alternate them with some small group activity. You could have several small group activities over the course of the day. I think it’s a good idea to put the groups together so that they are diverse, and that requires finding out a little about learners’ work experience, qualifications, etc before we begin. That mimics the diverse composition of data science teams in this day and age, and I tell people that from the outset so that they know they should respect and listen in the group to get the most out of the learning experience.

What about this maths avoidance? It strikes me as odd that there are many introductory textbooks and courses for statistics that play down the maths, on the basis that the learners are going to operate a computer and construct a model in code, not by matrix algebra and calculus. Yet, this doesn’t happen for Bayesian statistics. This is perhaps down to two historical factors. Firstly, Bayes was an advanced subject so only people who already had degrees in statistics or mathematics would encounter it. Secondly, Bayesians spent decades being mocked and sidelined, and responded by foregrounding their mathematical rigour in the hope of beating their critics. But nowadays, we want all sorts of people who analyse data to think about using Bayes from the beginning of their careers, so we should offer them the same option. If you want the maths, there are plenty of options for you, but I would like to offer something a little different, a little more inclusive.

]]>