I have been working on a Stata add-on command to fit Bayesian models using Stan, and this is now out for testing. In this post, I want to introduce it, explain why it’s important, and encourage you all to try it out and give your feedback. I have already used it ‘in anger’ in two substantive projects, predicting student achievement and diagnosing tuberculosis of the eye, and it works fine – at least on my computers.

Stata version 14 includes, for the first time, Bayesian commands which provide the Metropolis-Hastings algorithm and the Gibbs sampler. These are the procedures available in the popular software BUGS and JAGS. However, there are many situations where they can grind along very slowly. The most common cause is a lot of correlated parameters, for example in a Bayesian multilevel model where each cluster gets its own parameter. Broadly speaking, you can picture this as the Gibbs sampler tries out each parameter one at a time, conditional on the others. This is like trying to move diagonally across a grid of orthogonal streets: it will take a while to get to the other side. Sometimes you can rotate the grid (orthogonal parameterisation), but sometimes you can’t. What you really need is a different algorithm, which is where Hamiltonian Monte Carlo (HMC) comes in. Radford Neal’s chapter in the *Handbook of Markov Chain Monte Carlo* provides an excellent overview. Further improvements in efficiency were achieved with the No U-Turn Sampler (NUTS), proposed by Hoffman and Gelman and explained in full in their 2014 paper. This was the starting point for Stan.

Stan is collaboratively-built, open-source software to run HMC, NUTS and more. HQ is at Columbia University, but the developers are based all over the world. Personally, I think it’s amazing, and they don’t pay me to say that. It is stable and superfast and tries to help you get the model code right, which is more than can be said for its Gibbs predecessors. There’s a very active and supportive online community on Google Groups. At heart, it’s a C++ library, but you don’t have to tangle with that stuff because there’s a command-line interface and interfaces for R, Python, Julia and Matlab. Now there is one for Stata too.

There is currently only one place where you can get StataStan: https://github.com/stan-dev/statastan/tree/alpha-test, where you can get the main stan.do file and a stan-example.do file. It is under alpha-testing, meaning that we have not run it on every combination of Stata version, flavor and operating system, and need to be assured that there are no fundamental incompatibilities before we move to beta-testing. This is where **you** come in: please try it out and let us know if it’s OK or not, also what version and flavor of Stata you have (like “12/SE”) and what operating system you’re using, including version and 32/64-bit if using Windows.

When it goes to beta-testing I’ll add it to my website so you could download from there inside Stata, and we’ll put links on the Stan homepage. When it passes that, and all the important wishes have been incorporated, I’ll send it to the SSC repository. I will update this blog post as we pass each of those hurdles. The latest stable and under-development versions will always be on GitHub, so if you are registered with them, you can contribute to StataStan. Don’t be shy, there’s plenty to do.

Now, here’s a toy example of using it, where you have a model file called bernoulli.stan (this is contained in the examples folder when you install Stan) that contains this model:

`data {`

int N;

int y[N];

}

parameters {

real theta;

}

model {

theta ~ beta(1,1);

for (n in 1:N)

y[n] ~ bernoulli(theta);

}

That means there are two parts to the data: N, the total number of observations, and y a vector of integers of length N. Our model is that y arose from a Bernoulli process with probability parameter theta, and we are putting a flat prior on theta anywhere from zero to one. That prior is specified as a beta distribution in the Stan example folder but you could make it even more efficient with a uniform distribution (HMC is more forgiving of uniform priors than M-H/Gibbs). Then in Stata, you can make some silly data like this:

`clear`

set obs 10

generate y=0

replace y=1 in 1/2

That’s basically two 1s and eight 0s. OK, now get ready to set the world alight by estimating the probability of success when you’ve just got 2 out of 10 in an experiment. StataStan will pass the variables you specify over to Stan, as well as global macros, so let’s put the total number of observations into a macro called N:

`quietly count`

global N=r(N)

Now we can call the stan command:

`stan y, modelfile("bernoulli.stan") ///`

cmd("/root/cmdstan/cmdstan-2.6.2") globals("N") load mode

The options for the stan command are explained at the top of the stan.do file, and in the GitHub ‘README’ file. Soon, they will move into a Stata help file and a pdf manual. In this case we say that we want to send the variable called y, we name the file that contains the model, point out the location of Stan (you’ll need to change this for your computer), say that we want to send the global macro called N to Stan, along with the variable y, that we want to load the chains of HMC steps back into Stata when it’s done, and that we want to find the posterior mode as well as the default mean and median.

It’s also possible to specify the model inline, which is to say inside your do-file, so you don’t have to mess around with Stata and a text editor open side-by-side. You can read about the different ways to achieve this in the stan-example.do file on the GitHub page.

Note: on a Windows computer, you won’t see progress in the Stata output window; all the output will appear when it’s done. It’s a Windows thing; what can I say? Compiling even this toy example can take a couple of minutes, so don’t panic if you see nothing happening, but I’ve got a cunning plan to get around this and will add it to StataStan soon.

This is great, Robert. Now, I have 220K observations in a cross-classified, two-way mixed model; it can’t be very much more difficult than the “set obs 10” example you are showing, can it?

Easy to set up the model code, and although it will take longer to run, the mixing is so much better than Gibbs sampling that you only need 1000-5000 iterations. It would take days in Jags or BUGS. Now, watch this space because we have a Stata (bayesmh) vs Stan comparison paper nearly finished for ArXiv, and we use a crossed random effect IRT-like model for that, which sounds like your data. First author will be Daniel Furr.