A simple R bootstrap function for beginners

I teach some introductory stats classes with SPSS, and one of the frustrations for me is that you have to pay an extra wad of cash to do any bootstrapping [edit August 2014: actually, the situation in new versions is much happier than this: see footnote below!]. It’s not exactly the complete analysis solution that you might expect from the sales literature. I could go on, but I guess IBM have better lawyers than me.

I think it’s a good idea to introduce the bootstrap early on. It’s not some advanced rocket-science technique any more, and beginners do a lot of medians and quartiles, for which there’s no (proper) standard error. What’s more, it makes random sampling at the heart of inference really explicit, which I think actually makes learning easier.

bootstrapfarm

So, I’ve been considering showing the students a quick diversion into R for the bootstrap. Stata has a great bootstrap syntax, but it’s not available in our computer teaching rooms. We don’t have time to do more in R, and it’s beyond my control to switch the whole software package for the course. The trouble is, the boot package has that weird thing where you have to redefine whatever statistic you’re interested in as a function with two parameters, even if a perfectly good one already exists. I suppose it’s to do with efficiency and vectorising over the replicate index vectors, but it’s the last thing you want to talk beginners through.

So I thought I could usefully wrap up the usual boot and boot.ci functions in a simpleboot wrapper.  It’s not a wise choice for serious analysis, and I haven’t made it pretty, but for teaching it makes things a little more accessible. Any thoughts, let me know.

simpleboot<-function(x,stat,reps=1000) {
cat("Bootstrapping can go wrong!\n")
 cat("This simple function will not show you warning messages.\n")
 cat("Check results closely and be prepared to consult a statistician.\n")
 if(stat=="max" | stat=="min") { warning("Bootstrap is likely to fail for minima and maxima") }
 require(boot)
 eval(parse(text=eval(substitute(paste("p.func<-function(x,i) ",stat,"(x[i])",sep=""),list(stat=stat)))))
 myboots<-boot(x,statistic=p.func,R=reps,stype="i")
 hist(myboots$t,breaks=25,main="EDF from bootstrap",xlab=stat)
 suppressWarnings(return(list(replicates=reps,point.estimate=myboots$t0,normal.ci=c(boot.ci(myboots)$normal[2],boot.ci(myboots)$normal[3]),
percent.ci=c(boot.ci(myboots)$percent[4],boot.ci(myboots)$percent[5]),
bca.ci=c(boot.ci(myboots)$bca[4],boot.ci(myboots)$bca[5]))))
}
# example:
mydata<-rchisq(25,df=3)
simpleboot(mydata,"median")

edited August 2014: So, as of version 21 (and maybe a little earlier – I skipped some) there is a “Bootstrap” button on most dialog boxes. I will start teaching how to use this as of the start of term in September 2014.

Advertisements

7 Comments

Filed under learning, R

7 responses to “A simple R bootstrap function for beginners

  1. You have an error in the hist() call, bmed should be myboots.

    You can avoid all the eval(parse(…)) if you just pass in the function itself, instead of a string. You have to use deparse(substitute()) to get the xlabel u it seems simpler to me. Here is a suggested rewrite:
    simpleboot<-function(x,stat,reps=1000) {
    cat("Bootstrapping can go wrong!\n")
    cat("This simple function will not show you warning messages.\n")
    cat("Check results closely and be prepared to consult a statistician.\n")
    xlab<-deparse(substitute(stat))
    if(xlab=="max" | xlab=="min") { warning("Bootstrap is likely to fail for minima and maxima") }
    require(boot)
    myboots<-boot(x,statistic=function(x, i) stat(x[i]),R=reps,stype="i")
    hist(myboots$t,breaks=25,main="EDF from bootstrap",xlab=xlab)
    suppressWarnings(return(list(replicates=reps,point.estimate=myboots$t0,normal.ci=c(boot.ci(myboots)$normal[2],boot.ci(myboots)$normal[3]),
    percent.ci=c(boot.ci(myboots)$percent[4],boot.ci(myboots)$percent[5]),
    bca.ci=c(boot.ci(myboots)$bca[4],boot.ci(myboots)$bca[5]))))
    }

  2. Tim

    Why not just use some variation of: for (i in 1:reps) x[sample(n, n, replace=TRUE)] ? This kind of approach makes it also very ilustrative on how bootstrap works.

    • Hi Tim
      I thought about this first, then decided to leverage the BCa confidence intervals already in boot.ci.
      But that is for students who are shy of programming. For those comfortable, it would be a much better idea. Also, I wonder about doing it in JavaScript (D3) instead and having the whole process visualised on a webpage. That could be interesting!

      • Tim

        Robert, I think it could be a nice example to encourage them to programming and show bootstrap “guts”
        For me learning R has helped me to gain deeper understanding of data analysis and statistics, I learned with it much more than from statistics classes.

      • Yes, I know what you mean! Most stats classes are stuck in the 1930s! Someone once told me “if you want to understand the Gibbs sampler, program your own one”!

  3. Next job on this is to pass further parameters for the statistic and allow it to return more than one result, for example quantile(x,probs=c(0.25,0.5,0.75))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s