Tag Archives: Bayesian

Dataviz of the week, 12/4/17

This week, a chart with some Bayesian polemic behind it. Alexander Etz put this on Twitter:

C7oC-Z5VUAAe77z

He is working on an R package to provide easy Bayesian adjustments for reporting bias with a method by Guan & Vandekerchhove. Imagine a study reporting three p-values, all just under the threshold of significance, and with small-ish sample sizes. Sound suspicious?

jim-taggart-theres-been-a-murder

Sounds like someone’s been sniffing around after any pattern they could find. Trouble is, if they don’t tell you about the other results they threw away (reporting bias), you don’t know whether to believe them or not. Or there are a thousand similar studies but this is the (un)lucky one and this author didn’t do anything wrong in their own study (publication bias).

Well, you have to make some assumptions to do the adjustment, but at least being Bayesian, you don’t have to assume one number for the bias, you can have a distribution. Here, the orange distribution is the posterior for the true effect once the bias has been added (in this case, p>0.05 has a 0% chance of getting published, which is not unrealistic in some circles!) This is standard probabilistic stuff but it doesn’t get done  because the programming seems so daunting to a lot of people. The more easy tools – with nice helpful visualisations – the better.

Leave a comment

Filed under Bayesian, R, Visualization

False hope (of methodological improvement)

I had a paper out in January with my social worker colleague Rick Hood, called “Complex systems, explanation and policy: implications of the crisis of replication for public health research”. The journal page is here or you can grab the post-print here. It’s a bit of a manifesto for our research standpoint, and starts to realise a long-held ambition of mine to make statistical thinking (or lack thereof) and philosophy of science talk to one another more.

We start from two problems: the crisis of replication and complex systems. Here’s a fine history of the crisis of replication from Stan chief of staff Andrew Gelman. By complex systems, we mean some dynamic system which is adaptive (responds to inputs) and non-linear (a small input might mean a big change in output, but hey! because it’s adaptive, you can’t guarantee it will keep doing that). In fact, we are in particular interested in systems that contain intelligent agents, because this really messes up research. They know they are being observed and can play games, take short-term hits for long-term goals, etc. Health services, society, large organisations, all fit into this mould.

There have been some excellent writers who have tackled these problems before and we bring in Platt, Gigerenzer, Leamer, Pawson, Manski. I am tempted to give nutshells of their lives’ work but you can get it all in the paper. Sadly, although they devoted a lot of energy and great ideas to making science work better, they are almost unknown among scientists of all stripes. Reviewers said they enjoyed reading the paper and found it fresh, but felt that scientists knew about these problems already and knew how to tackle them. You have to play along and be respectful to reviewers but we thought this was wishful thinking. Bad science is everywhere and a lot of it involves that deadly combination of our two problems; public health (focus of our paper) is more susceptible than most fields because of the complex system (government, health providers, society…) and often multi-faceted interventions requiring social traction to be effective. At the same time it draws on a medical research model that originated in randomised controlled trials, rats and Petri dishes. The reviewers and us disagree on just how far most of that research has evolved from its origins.

My experience of health and social care research in a realist-minded faculty is that the more realist and mixed-method the research gets, and the more nuanced and insightful the conclusions are, the less it is attended to by the very people who should learn from it. Simple statistics and what Manski called “incredible certitude” are much more beguiling. If you do this, that will follow. Believe me.

Then, we bring in a new influence that we think will help a lot with this situation: Peter Lipton. He was a philosopher at Cambridge and his principal contribution to his field was the concept of “inference to the best explanation” (also the title of his excellent book which I picked up somewhat by mistake in Senate House Library one day in 2009, kick starting all of this), which takes Peirce’s abductive reasoning and firms it up into something almost concrete enough to actually guide science and reasoning. The fundamental problem with all these incredible certitude studies is that they achieve statistical inference and take that to be the same thing as explanatory inference. As you know, we are primed to detect signals, and evolved to favour false positives, and have to quantify and use statistics to help us get past that. The same problem arises in explanation, but without the quantification to help.

cheese v bedsheets

Cheese consumption vs death entangled in bedsheets: you believe it more than the famous Nicolas Cage films vs swimming pool deaths correlation because it has a more (almost) plausible explanation. From Tyler Vigen’s Spurious Correlations.

A good explanation make a statistical inference more credible. It’s what you’re hoping to achieve at the end of Platt’s “strong inference” process of repeated inductive – deductive loops. This is Lipton’s point: as humans, when we try to learn about the world around us, we don’t just accept the likeliest explanation, as statistics provides, but we want it to be “lovely” too. As I used to enjoy having on my university profile of relevant skills (on a few occasions a keen new person in comms would ask me to take it down but I just ignored them until they left for a job elsewhere that didn’t involve pain-in-the-backside academics):

Screen Shot 2017-03-08 at 17.39.09

By loveliness, Lipton meant that it gives you explanatory bangs for bucks: it should be simple and it should ideally explain other things beyond the original data. So, Newton’s gravitation is lovely because it has one simple formula and that works for both apples and planets. Relativity seems too hard to comprehend to be lovely, but as the phenomena explained by it stack up, it becomes a winner. Wave-particle duality likewise. In each case, they are accepted not for their success in statistical but explanatory inference. It’s not just laws of physics but more sublunar cause and effect too: if you impose a sugar tax, will people be healthier in years to come? That’s the extended worked example we use in the paper.

Now, there are problems with explanations:

  • we don’t know how to search systematically for them
  • we don’t know where they come from; they generally just “come to mind”
  • we don’t know how to evaluate and choose from alternatives among them
  • we usually stop thinking about explanation as soon as we hit a reasonably good candidate, but the more you think, the more refinements you come up with
  • we seem to give too much weight to loveliness compared to likelihood

and with loveliness itself too. Firstly, it’s somewhat subjective; consider JFK’s assassination. If you are already interested in conspiracy theories and think that government spooks do all sorts of skullduggery, then the candidate explanation that the CIA did it is lovely – it fits with other explanations you’ve accepted, and perhaps explains other things – they did it because he was about to reveal the aliens at Roswell. If you don’t go for that stuff then it won’t be lovely because there are no such prior beliefs to be co-explained. In neither the CIA candidate nor the Oswald candidate explanation are there enough data to allow likelihood to get in there and help. It would be great if we could meaningfully quantify loveliness and build it into the whole statistical process which was supposed to help us get over our leopard-detecting bias for false positives, but that seems very hard. Lipton, in fact, wrote about this and suggested that it might be possible via Bayesian priors. I’ll come back to this.

So, here’s a couple of examples of loosely formed explanations that got shot from the hip after careful and exacting statistical work.

Long ago, when I was a youngster paying my dues by doing data entry for the UK clinical audit of lung cancer services, we put out a press release about differences by sex in the incidence of different types of tumour. I’m not really sure why, because that’s an epidemiological question and not one for audit, but there ya go. It got picked up in various places and our boss was going to be interviewed on Channel 5 breakfast news. We excitedly tuned in. “Why is there a difference?” asked the interviewer. They had heard the statistical inference and now they wanted an explanation.

Of course, we didn’t know. We just knew it was there, p=whatever. But it is human nature to seek out explanation and to speculate on it. The boss had clearly thought about it already: “It may be the feminine way in which women hold their cigarettes and take small puffs”. Whaaat? Where did that come from? I’d like to say that, before dawn in my shared apartment in Harringay, I buried my face in my hands, but that requires some understanding of these problems which I didn’t acquire until much later, so I probably just frowned slightly at the stereotype. I would have thought, as earnest young scientists do, that any speculation on why was not our business. Now, I realise two things: the scientist should propose explanations lest someone less informed does, and they should talk about them, so that the daft ones can get polished up, not stored until they are released pristine and unconstrained by consensus onto national television. It would be nice if, as our paper suggests, these explanations got pre-specified like statistical hypotheses should be, and thus the study can be protected against the explanatory form of multiple testing.

Then here’s a clip from the International New York Times (erstwhile weekly paper edition in the UK) dated on my 40th birthday (I never stop looking for good stuff to share with you, readers).

It’s all going well until the researcher starts straying from ‘we found an association’ into ‘this is why it happens’. “There are more than a thousand compounds in coffee. There are a few candidates, but I don’t know which is responsible.” The opposite problem happens here: by presupposing that there must be a chemical you can attribute effects to (because that’s what he was shown in med school), we can attribute it to an unknown one, and thus by begging the question back up the statistical inference with a spurious explanatory one. Here, there is a lack of explanation, and that should make us rightly suspicious of the conclusion.
coffee-protects-the-liver-nyt-22oct2014

On these foundations, we tentatively propose some steps researchers could take to improve things:

  • mixed-methods research, because the qual informs the explanation empirically
  • Leamer’s fragility analysis
  • pre-specify a mapping of statistical inference to explanation
  • have an analysis monitoring committee, like trials have a data monitoring committee
  • more use of microsimulation / agent-based modelling
  • more use of realist evaluation

Further in the future, we need:

  • methodological work on Bayesian hyperpriors for loveliness
  • better education, specifically dropping the statistical cookbook and following the ASA GAISE guidelines

This is strong medicine; funders and consumers of research will not give a damn for this time-consuming expense, bosses and collaborators will tell concerned researchers not to bother, and some of it could be so hard as to be practically impossible. In particular, Bayesian hyperpriors for loveliness are in the realm of methodological fancy, although some aspects exist, notably bet-on-sparcity, and I’ll return to that in a future post. But setting that to one side, if researchers do things like our recommendations, then over time we will all learn how to do this sort of thing well, and science will get better.

Right?

Wrong. None of this will happen any time soon. And this is, ironically, for the same reason that the problems arise in the first place: science happens in a complex system, and an intervention like ours can have an adverse effect, or no effect at all. Researchers respond to several conflicting forces and the psychosocial drivers of behaviour, stronger than appealing to their good nature, remain unchanged. They still scoff at navel-gazing philosophical writers and lump us into that category, they still get told to publish or perish, and they still get rewarded for publication and impact, regardless of the durability of their work. So if I was to talk about the future in the same form that Gelman wrote about the past, it would be a more pessimistic vision. Deep breath:

Icefields_parkway

A storm hits the city and the lights go out before I can prepare

This crisis is known to scientists in only a few areas, where the problem is particularly egregious (not to say to it won’t one day be revealed to have been bigger elsewhere, like public health, but that in these areas it is both quite bad and quite obvious): social and behavioural psychology most notably, although brain imaging and genetics have their own problems and believe they have fixed it by looking for really small p-values (this, lest you be mistaken, will not help). For most other fields of quantitative scientific endeavour, they don;t even realise they are about to get hit. I recall being introduced to a doctor by a former student when we bumped into each other in a cafe:
“Robert’s a statistician.”
“Oh, good, we need people like you to get the p-values going in the right direction”
Now, I know that was a light-hearted remark, but it shows the first thing that comes to mind with statistics. They have no idea what’s coming.

The whole of downtown looks dark like no one lives there

Statistical practice is so often one of mechanistic calculation. You can use recipes to crank the handle and then classify as significant (go to Oslo, collect Nobel prize) or non-significant (go to Jobcentre, collect welfare). There is no sign of explanation up front; it is grubbed up after the fact. It’s as though all human thought was abandoned the minute they turned on the computer. I just can’t understand why you would do that. Have more pride!

Why does this happen? These are at least some of the psychosocial forces I mentioned earlier:

  • The risk is carried by early career people: the junior academic or the budding data scientist. The older mentor is not expected to control every detail, and doesn’t take personal responsibility (it was Fox’s fault!)
  • Only a few such analyses are used (maybe one) to evaluate the junior person’s ability
  • Impact is valued; a reasonable idea for a whole organisation or programme of work, but not for projects, because there will always be a certain failure rate; paradoxically, this is also why academics play it safe with uninspiring incremental advances
  • Discovery and novelty are valued – as above
  • This sort of work is badly paid. You have to succeed quickly before you have to drop out and earn some cash.
  • The successful ones get the habit and can carry on misbehaving.

There’s a party uptown but I just don’t feel like I belong at all (do I?)

So what would happen when people operating in these psychosocial forces get confronted? We’ve seen some of this already in the recent past. Call the critics bullies, just ignore them, pretend that what they say is hilariously obscure left-bank tosh, say that the conclusion didn’t change anyway, suddenly decide it was only intended to be exploratory, find a junior or sub-contractor scapegoat, say you absolutely agree and make only a superficial change while grandstanding about how noble you are to do so, and of course there are more strategic ways for the bad guys to fight back that I listed previously. Medicine will prove to be much worse than psychology and resistant to (or oblivious of any need to) reform. There are reasons for this:

  • it’s full of cliques
  • they live and breathe hierarchy from school to retirement
  • whatever they tell you, their research is uni-disciplinary
  • there’s a DIY ethic that comes from that unfettered confidence in one’s own ability to do whatever anyone else does
  • they venerate busyness (no time for learning the niceties) and discovery (just get to the p-value)

I considered politicians with the same list and concluded that we don’t have to worry about them, reform will come from statistics up and post-truth, if such a thing exists, is transient. This might not apply to Trump types, of course, because they are not politicians. Cliques are open to coming and going, there is an expectation of advancing and taking turns in the hierarchy, their research is done by others and they can then blame the experts if it goes wrong, they do nothing themselves, they did humanities courses and venerate turning the idea over and over.

Let me leave you with a quote from the late, great Hans Rosling: “You can’t understand the world without statistics. You can’t understand the world with statistics alone.”

“False Hope” by Laura Marling is (c) Warner / Chappell Music Inc

 

Leave a comment

Filed under Uncategorized

I’m going freelance

At the end of April 2017, I will leave my university job and start freelancing. I will be offering training and analysis, focusing on three areas:

  • Health research & quality indicators: this has been the main applied field for my work with data over the last nineteen years, including academic research, audit, service evaluation and clinical guidelines
  • Data visualisation: interest in this has exploded in recent years, and although there are many providers coming from a design or front-end development background, there are not many statisticians to back up interactive viz with solid analysis
  • Bayesian modeling: predictive models and machine learning techniques are big business, but in many cases more is needed to achieve their potential and avoid a bursting Data Science bubble, and this is where Bayes helps to capture expert knowledge, acknowledge uncertainty and give intuitive outputs for truly data-driven decisions

Considering the many “Data Science Venn Diagrams”, you’ll see that I’m aiming squarely at the overlaps from stats to domain knowledge, communication and computing. That’s because there’s a gap in the market in each of these places. I’m a statistician by training and always will be, but having read the rule book and found it eighty years out of date, I’m have no qualms in rewriting it for 21st century problems. If that sounds useful to you, get in touch at robert@robertgrantstats.co.uk

This blog will continue but maybe less frequently, although I’ll still be posting a dataviz of the week. I’ll still be developing StataStan and in particular writing some ‘statastanarm’ commands to fit specific models. I’ll still be tinkering with fun analyses and dataviz like the London Café Laptop Map or Birdfeeders Live, and you’re actually more likely to see me around at conferences. I’ll keep you posted of such movements here.

1 Comment

Filed under Uncategorized

A statistician’s journey into deep learning

Last week I went on a training course run by NVIDIA Deep Learning Institute to learn TensorFlow. Here’s my reflections on this. (I’ve gone easy on the hyperlinks, mostly because I’m short of time but also because, you know, there’s Google.)

Firstly, to set the scene very briefly, deep learning means neural networks — highly complex non-linear predictive models — with plenty of “hidden layers” that makes them equivalent to regressions with millions or even billions of parameters. This recent article is a nice starting point.

Only recently have we been able to fit such things, thanks to software (of which TensorFlow is the current people’s favourite) and hardware (particularly GPUs; the course was run by manufacturer NVIDIA). Deep learning is the stuff that looks at pictures and tells you whether it’s a cat or a dog. It also does things like understanding your handwriting or making some up from text, ordering stuff from Amazon at your voice command, telling your self-driving car whether that’s a kid or a plastic bag in the road ahead, classifying images of eye diseases, etc etc. You have to train it on plenty of data, which is computationally intensive, and you can do that in batches (so it is readily parallelisable, hence the GPUs), but then you can just get on and run the new predictions quite quickly, on your mobile phone for example. TensorFlow was made by Google then released as open-source software last year, and since then hundreds of people have contributed tweaks to it. It’s recently gone to version 1.0.

If you’re thinking “but I’m a statistician and I should know about this – why did nobody tell me?”, then you’re right, they sneaked it past you, those damned computer scientists. But you can pick up EoSL (Hastie, Tibshirani, Friedman) or CASI (Efron & Hastie) and get going from there. If you’re thinking “this is not a statistical model, it’s just heuristic data mining”, you’re not entirely correct. There is a loss function and you can make that the likelihood. You can include priors and regularization. But you don’t typically get more than just the point estimates, and the big concern is that you don’t know you’ve reached a global optimum. “Why not just bootstrap it?” Well, partly because of the local optima problem, partly because there is a sort of flipping of equivalent sets of weights (which you will recognise if you’ve ever bootstrapped a principal components analysis), but also because if your big model, with the big data, takes 3 hours to fit even on AWS with a whole stack of power GPUs, then you don’t want to do it 1000 times.

It’s often hard to know whether your model is any good, beyond the headline of training and test dataset accuracy (the real question is not the average performance but where the problems are and whether they can be fixed). This is like revisiting the venerable (and boring) field of model diagnostic graphics. TensorFlow Playground on the other hand is an exemplary methodviz and there is also TensorBoard which shows you how the model is doing on headline stats. But with convolutional neural networks, you can do some natural visualisation. Consider the well-trodden MNIST dataset for optical character recognition:

screen-shot-2017-02-01-at-19-04-14

On the course we did some convolutional neural networks for this, and because it is a bunch of images, you can literally look at things like where the filters get activated visually. Here’s 36 filters that the network learned in the first hidden layer
screen-shot-2017-02-01-at-19-05-07
and how they get activated at different places in one particular number zero:
screen-shot-2017-02-01-at-19-05-23
And here we’re at the third hidden layer, where some overfitting appears – the filters get set off by the edge of the digit and also inside it, so there’s a shadowing effect. It thinks there are multiple zeros in there. It’s evident that a different approach is needed to get better results. Simply piling in more layers will not help.
screen-shot-2017-02-01-at-19-05-46

I’m showing you this because it’s a rare example of where visualisation helps you refine the model and also, crucially, understand how it works a little bit better.

Other data forms are not so easy. If you have masses of continuous independent variables, you can plot them against some smoother of the fitted values, or plot residuals against the predictor, etc – old skool but effective. Masses of categorical independent variables is not so easy (it never was), and if you want to feed in autocorrelated but non-visual data, like sound waves, you will have to take a lot on faith. It would be great to see more work on diagnostic visualisation in this field.

Another point to bear in mind is that it’s early days. As Aditya Singh wrote in that HBR article above, “If I analogize [sic] it to the personal computer, deep learning is in the green-and-black-DOS-screen stage of its evolution”, which is exactly correct. To run it, you type some stuff in a Jupyter notebook if you’re lucky, or otherwise in a terminal screen. We don’t yet have super-easy off-the-peg models in a gentle GUI, and they will matter not just for dabblers but for future master modellers learning the ropes – consider the case of WinBUGS and how it trained a generation of Bayesian statisticians.

You need cloud GPUs. I was intrigued by GPU computing and CUDA (NVIDIA’s language extending C++ to compile for their own GPU chips) a couple of years ago and bought some kit to play with at home. All that is obsolete now, and you would run your deep learning code in the cloud. One really nice thing about the course was that NVIDIA provided access to their slice of AWS servers and we could play around in that and get some experience of it. It doesn’t have to be expensive; you can bid for unused GPU time. And by the way, if you want to buy a bangin’ desktop computer, let me know. One careful owner.

You need to think about — and try — lots of optimisation algorithms and other tweaks. Don’t believe people who tell you it is more art than science, that’s BS not DS. You could say the same thing about building multivariable regressions (and it would also be wrong). It’s the equivalent of doctors writing everything in Latin to keep the lucrative trade in-house. Never teach the Wu-Tang style!

It’s hard to teach yourself; I’ve found no single great tutorial code out there. Get on a course with some tuition, either face-to-face or blended.

Recurrent neural networks, which you can use for time series data, are really hard to get your head around. The various tricks they employ, called things like GRUs and LSTMs, may cause you to give up. But you must persist.

You need a lot of data for deep learning, and it has to be reliably labelled with the dependent variable(s), which is expensive and potentially very time-consuming. If you are fitting millions of weights (parameters), this should come as no surprise. Those convnet filters and their results above are trained on 1000 digits, so only 100 examples of each on average. When you pump it up to all 10,000, you get much clearer distinctions between the level-3 filters that respond to this zero and those that don’t.

The overlap between Bayes and neural networks is not clear (but see Neal & Zhang’s famous NIPS-winning model). On the other hand, there are some more theoretical aspects which make the CS guys sweat that statisticians will find straightforward, like regularisation, dropout as bagging, convergence metrics, or likelihood as loss function.

Statisticians should get involved with this. You are right to be sceptical, but not to walk away from it. Here’s some salient words from Diego Kuonen:
diego-kuonen-quote

1 Comment

Filed under computing, learning, machine learning

Complex systems reading

Tomorrow I’ll be giving a seminar in our faculty on inference in complex systems (like the health service, or social services, or local government, or society more generally). It’s the latest talk on this subject that is really gelling now into something of a manifesto. Rick Hood and I intend to send off the paper version before Xmas, so I won’t say more about the substance of it here (and the slides are just a bunch of aide-memoire images), other than to list the references, which contains some of my favourite sources on data+science:

mr-death

I deliberately omit the methodologically detailed papers from this list, but in the main you should look into Bayesian modelling, generalised coarsening, generalised instrumental variable models, structural equation models, and their various intersections.

Leave a comment

Filed under Bayesian, research

Introducing StataStan

stanlogo-main

I have been working on a Stata add-on command to fit Bayesian models using Stan, and this is now out for testing. In this post, I want to introduce it, explain why it’s important, and encourage you all to try it out and give your feedback. I have already used it ‘in anger’ in two substantive projects, predicting student achievement and diagnosing tuberculosis of the eye, and it works fine – at least on my computers.

Stata version 14 includes, for the first time, Bayesian commands which provide the Metropolis-Hastings algorithm and the Gibbs sampler. These are the procedures available in the popular software BUGS and JAGS. However, there are many situations where they can grind along very slowly. The most common cause is a lot of correlated parameters, for example in a Bayesian multilevel model where each cluster gets its own parameter. Broadly speaking, you can picture this as the Gibbs sampler tries out each parameter one at a time, conditional on the others. This is like trying to move diagonally across a grid of orthogonal streets: it will take a while to get to the other side. Sometimes you can rotate the grid (orthogonal parameterisation), but sometimes you can’t. What you really need is a different algorithm, which is where Hamiltonian Monte Carlo (HMC) comes in. Radford Neal’s chapter in the Handbook of Markov Chain Monte Carlo provides an excellent overview. Further improvements in efficiency were achieved with the No U-Turn Sampler (NUTS), proposed by Hoffman and Gelman and explained in full in their 2014 paper. This was the starting point for Stan.

Stan is collaboratively-built, open-source software to run HMC, NUTS and more. HQ is at Columbia University, but the developers are based all over the world. Personally, I think it’s amazing, and they don’t pay me to say that. It is stable and superfast and tries to help you get the model code right, which is more than can be said for its Gibbs predecessors. There’s a very active and supportive online community on Google Groups. At heart, it’s a C++ library, but you don’t have to tangle with that stuff because there’s a command-line interface and interfaces for R, Python, Julia and Matlab. Now there is one for Stata too.

There is currently only one place where you can get StataStan: https://github.com/stan-dev/statastan/tree/alpha-test, where you can get the main stan.do file and a stan-example.do file. It is under alpha-testing, meaning that we have not run it on every combination of Stata version, flavor and operating system, and need to be assured that there are no fundamental incompatibilities before we move to beta-testing. This is where you come in: please try it out and let us know if it’s OK or not, also what version and flavor of Stata you have (like “12/SE”) and what operating system you’re using, including version and 32/64-bit if using Windows.

When it goes to beta-testing I’ll add it to my website so you could download from there inside Stata, and we’ll put links on the Stan homepage. When it passes that, and all the important wishes have been incorporated, I’ll send it to the SSC repository. I will update this blog post as we pass each of those hurdles. The latest stable and under-development versions will always be on GitHub, so if you are registered with them, you can contribute to StataStan. Don’t be shy, there’s plenty to do.

Now, here’s a toy example of using it, where you have a model file called bernoulli.stan (this is contained in the examples folder when you install Stan) that contains this model:

data {
int N;
int y[N];
}
parameters {
real theta;
}
model {
theta ~ beta(1,1);
for (n in 1:N)
y[n] ~ bernoulli(theta);
}

That means there are two parts to the data: N, the total number of observations, and y a vector of integers of length N. Our model is that y arose from a Bernoulli process with probability parameter theta, and we are putting a flat prior on theta anywhere from zero to one. That prior is specified as a beta distribution in the Stan example folder but you could make it even more efficient with a uniform distribution (HMC is more forgiving of uniform priors than M-H/Gibbs). Then in Stata, you can make some silly data like this:

clear
set obs 10
generate y=0
replace y=1 in 1/2

That’s basically two 1s and eight 0s. OK, now get ready to set the world alight by estimating the probability of success when you’ve just got 2 out of 10 in an experiment. StataStan will pass the variables you specify over to Stan, as well as global macros, so let’s put the total number of observations into a macro called N:

quietly count
global N=r(N)

Now we can call the stan command:

stan y, modelfile("bernoulli.stan") ///
cmd("/root/cmdstan/cmdstan-2.6.2") globals("N") load mode

The options for the stan command are explained at the top of the stan.do file, and in the GitHub ‘README’ file. Soon, they will move into a Stata help file and a pdf manual. In this case we say that we want to send the variable called y, we name the file that contains the model, point out the location of Stan (you’ll need to change this for your computer), say that we want to send the global macro called N to Stan, along with the variable y, that we want to load the chains of HMC steps back into Stata when it’s done, and that we want to find the posterior mode as well as the default mean and median.

It’s also possible to specify the model inline, which is to say inside your do-file, so you don’t have to mess around with Stata and a text editor open side-by-side. You can read about the different ways to achieve this in the stan-example.do file on the GitHub page.

Note: on a Windows computer, you won’t see progress in the Stata output window; all the output will appear when it’s done. It’s a Windows thing; what can I say? Compiling even this toy example can take a couple of minutes, so don’t panic if you see nothing happening, but I’ve got a cunning plan to get around this and will add it to StataStan soon.

2 Comments

Filed under Bayesian, computing, Stan, Stata

Roman dataviz and inference in complex systems

I’m in Rome at the International Workshop on Computational Economics and Econometrics. I gave a seminar on Monday on the ever-popular subject of data visualization. Slides are here. In a few minutes, I’ll be speaking on Inference in Complex Systems, a topic of some interest from practical research experience my colleague Rick Hood and I have had in health and social care research.

Here’s a link to my handout for that: iwcee-handout

In essence, we draw on realist evaluation and mixed-methods research to emphasise understanding the complex system and how the intervention works inside it. Unsurprisingly for regular readers, I try to promote transparency around subjectivities, awareness of philosophy of science, and Bayesian methods.

4 Comments

Filed under Bayesian, healthcare, learning, R, research, Stata, Visualization