# Simulation (is where it’s happening)

Jim Silverton wrote to the Allstat mailing list recently:

“Hi,

Anyone up for a challenge?

Suppose we have  random variables that are random points on the surface of a sphere. What is the probability that the tetrahedron made by joining these points, contain the origin?

I tried over and over – no idea.”

This is really a classic probability “hard” question. If you’re comfortable with geometry and probability (and you happen to think of it in the right way), you’ll be fine. I looked it up online and found it came from the 1992 Putnam math competition (and could well have existed before that). There’s a solution at Wolfram so scary it would make Steven Seagal weep, while Howard and Sisson riff on it in a paper from 1996. You have to mess about with calculus and vector geometry and probability to get the extended answers. Don’t you just love that? “The analytic result is difficult to compute [but here it is!]” It’s just like that old cartoon where one wild-haired prof has written on the board a load of algebra followed by “and then a miracle happens”, and the other says “I think there’s a flaw in your proof”.I have to say, I hate this kind of thing. My family have pretty much learned not to show me logic puzzles in the newspaper or anything like that. I don’t see the point. There clearly is an answer, and you just have to struggle to find it. Where’s the satisfaction in that?

But this innocuous post caught my imagination because it seemed to me a good vehicle to promote computing skills and simulation over teeth-grinding math. Wolfram says the answer is 1/8. I wrote this R code to make random tetrahedra 100,000 times and in each case check whether they contained the origin. It takes 61 seconds to run on my office PC. The answer is apparently 0.12391 (95% confidence interval 0.122 to 0.126), which is pretty neat compared to the analytical result of 0.125.

``` # x is the matrix of 4 points on the unit sphere # d is in 1:4, indicating which vector to compare with the plane joining # the other three. # If the 4th point and the plane are in opposite directions, this supports # the tetrahedron containing the origin. If all four are opposite, it # is confirmed (so it seems to me, but I leave proof to others with # time on their hands!) m <- function(xx) { return(sqrt(sum(xx^2))) } proj<-function(x,d) { j<-(1:4)[-d] pq<-x[j,]-x[j,] pr<-x[j,]-x[j,] orth<-c((pq*pr)-(pq*pr), (pq*pr)-(pq*pr), (pq*pr)-(pq*pr)) unitorth<-orth/m(orth) plane<-unitorth*sum(x[j,]*unitorth) point4<-unitorth*sum(x[d,]*unitorth) oppo<-((plane>0 & point4<0)|(point4>0 & plane<0)) return(list(oppo,point4,plane)) } set.seed(434) iter<-100000 results<-rep(NA,iter) vol<-rep(NA,iter) pb<-txtProgressBar(min=0,max=iter,style=3) Sys.time() for (i in 1:iter) { a <- matrix(runif(12,min=-1,max=1), nrow=4) mags <- apply(a, 1, m) mags <- matrix(rep(mags, 3), ncol=3) x <- a/mags temp<-rep(NA,4) for (k in 1:4) { temp[k]<-proj(x,k)[] } results[i]<-all(temp) vol[i]<-(1/6)*abs(det(cbind(x,rep(1,4)))) setTxtProgressBar(pb,i) } table(results)/iter png("volume.png") plot(density(vol)) dev.off() Sys.time() ```

My approach to checking whether a tetrahedron contains the origin is a little heuristic, as you’ll see. It looks sensible enough though; if you draw a line from the origin to the kth vertex, and also from the origin, orthogonal to the plane joining the other three points, and they are in the opposite direction for all k, then the origin is contained.

My point is simply that simulation gives you enormous flexibility with gnarly problems. That code above throws in the extra stuff they do at Wolfram because the original question wasn’t hard enough. What are the central moments of the distribution of the volumes? Who cares. Let’s just get the distribution straight out of the simulation! After doing the quintuple integral and then some, Wolfram say the mean volume is (4/105)pi = 0.11968. My code says 0.11923. You want the distribution of the volumes of these tetrahedra? Boom! There it is.

There are times when you can get an analytical formula and evaluate that in a split second for each scenario, which is obviously a good move. For example, you could solve a multiple linear regression by MCMC, or by matrix algebra, and the latter is going to be lightning fast, always reliable, but not extendable to anything more complex. There are other times when you can’t get the formula but by thinking about the data-generating process, you can write a simulation program really quickly and get on with your life. How about the sampling distribution of the third quartile of dodecahedral volumes in 11-dimensional space whose vertices are sampled from autocorrelated mixtures of multivariate Cauchy distributions? No problem. Last week I did some sample size calculations for students, all by simulation, because there was no simple applicable formula. It just works, but surprisingly few places teach people these skills.

PS Do you teach this kind of stuff in an introductory stats setting? I’d like to hear from you! We might be able to put together a scientific meeting around this theme. There is interest and support and a central London venue but I need speakers…

1. Kevin McConway says:

Hmm. But the Wolfram MathWorld page gets to its answer of 1/8 in its first paragraph, before it gets to that horrible volume calculation you quote, which (as far as I can see) has nothing to do with the probability asked about, but is instead about the distribution of the volume of the random tetrahedron. Likewise, the Howard and Sisson thing is about generalizations of the original problem, which do (it seems) require lots of hard maths. You’re right that one can get to the answer, without hard maths, by simulation, but a similar point is that you can also get to it without hard maths by thinking about it geometrically and not just jumping into either the maths or the simulation.

1. Robert says:

Sure, but only if you happen to think about it in the way that leads to a simple answer. Ah, but that’s mathematics all round. And then as you seek answers to increasingly complex situations, simulation will soon outpace analytical answers.

2. kent37 says:

I’m not sure if I follow your code, but I don’t think you are correctly picking random points on the sphere. It looks like you are picking random points in a 2x2x2 cube and projecting them onto the unit sphere. This will tend to prefer points on the sphere nearer to the corners of the sphere. For a better method, Wolfram to the rescue 🙂
http://mathworld.wolfram.com/SpherePointPicking.html

1. Robert says:

True, and I’m sure in general it’s not a good idea to go projecting quick ‘n’ dirty random points onto the shape you really need, but in this case I don’t believe it makes any difference. The tetrahedra formed from the excess points in the corners of the cube have the same probability of containing the origin, by the same Wolfram logic. But of course you can check this with simulation on the excess points by replacing
a <- matrix(runif(12,min=-1,max=1), nrow=4)
with:
a<-NULL
npoints<-0
while(npoints<4) {
p <- runif(3,min=-1,max=1)
if(m(p)>1) {
a<-rbind(a,p)
npoints<-npoints+1
}
}
and I get 0.12547 containing the origin with a very similar-looking distribution of volumes.