Monthly Archives: January 2014

Sonification at the BBC

It must be time for the occasional appearance of sound<-data on the airwaves. It happens every now and then, like another swing of the pendulum on whether red wine or coffee is good or bad for you. Sorry if I sound<-weary, but it is generally rather primitive stuff. Here’s a link to a brief feature on the Today programme (aren’t they all brief, with the notable exception of business and sport?) this morning. If you can’t hear it, this is my own transcript made while eating my lunch:

#####################################

Sarah Montague: Can you turn information into music, so that instead of looking at a spreadsheet or a graph, you could listen to what it sounds like? Would that help you to spot patterns that are otherwise very hard to discern? Domenico Vicinanza is from DANTE, the Delivery of Advanced Network Technology to Europe, and before we hear from him, here’s a clip that could be described as the sound of space discovery.

<Clip>

SM: Domenico Vicinanza, what are we listening to there?

DV: Good morning Sarah, we are listening to the sonification of 300,000 data taken by Voyager 1 and 2 during their 36-year existence in space.

SM: Hold on a second, that sounded just like music. You’re saying that …it obviously was… but it didn’t start with someone composing the piece.

DV: That’s right, sonification is basically the representation of information or data by means of sounds and melodies instead of using points and lines. When Voyager 1 and 2 were in space, they took measurements. Measurements  are numbers and, what we do, we just match the numbers to musical sounds. Basically, the larger the number the higher the pitch of the sound we use. The smaller the number, the lower the pitch. So the music, if you like, is following exactly the same behaviour as the data. If the data is increasing, the pitch is going up, if the data is decreasing, the pitch is going down, and so we can say that this process is actually creating a sound fingerprint of the spacecraft.

SM: So are you saying that you took information about Voyager 1 and 2 – their positions – and you did not manipulate it in any way when you turned it into music?

DV: So, we used one particular measurement from the Voyager data, it was the cosmic ray count, the number of particles arriving every second at the detectors on Voyager 1 and 2. The one thing I did was mapping this number to a diatonic scale, using the C major scale. In this way, the set of notes we were having were easy to play or easy to harmonise. I didn’t add any single note manually.

SM: OK, we’ve got very little time so can I just ask you, how do you think this process could be used?

DV: There are many applications of sound and sonification for other applications, and the main reason is the fact that sound can bundle more information than any graph or picture that we can imagine. Because vision is limited to three dimension, but sounds can stack together many more layers. You can listen to an orchestra of eight, nine, ten players and actually listen to each of them.

##################################################

So why am I weary? Well, I can’t really justify being weary until I get off my backside and actually investigate what mappings and explanations work in sonic communication, but if I had to make an educated guess now about best practice in sonification it would be:

  1. Pitch is not the be-all and end-all. In fact, it is probably less clearly perceived than rhythmic and timbral changes (unless you are a trained musician, but we don’t make graphs for trained statisticians only…). Every time I see / hear pitch=f(x) I die a little. When pitch is an element of {C,D,E,F,G,A,B} I die a little more. But later, I recover.
  2. Time dominates all other dimensions in our perception. If you map time to time, you are making it the main focus of interest. Is it really of interest in this application?
  3. Like graphs, you need an explanatory prolegomena (which should not include words like prolegomena). Vicenanza blew it by talking about the mapping over all the short time he was given. I may not have done any better under fire at 6:50 am, but knowing I would get about 30 seconds to explain it, I would give one example. Oh, and that means just featureless noodling is probably not the best bit to use for the prolegomenous exemplarium. Let’s have gravity fields as they slingshot past Jupiter and Saturn – features! patterns! and a story! all in one, like kosmic Kinder eggs – please, not cosmic rays: possibly the most truly “random” (i.e. boring) phenomenon you could find anywhere!
  4. C major. Really? You’ve just crushed any chromatic features out of existence, and as harmonic ones will only be perceived in relation to their neighbours in recognised patterns and cadences, which you won’t get because it’s random, you’ll be left with very little of note. Trust me, I’ve read Walter Piston from cover to cover. Hidden depths, you see…
  5. I like his dimensionality point at the end, but it’s not entirely true. He seems to have forgotten shapes, colour, sizes and time, but what the hell, let’s give the guy a break. If I had to cite an example here it would be neurosurgery.
  6. Before anyone attempts any sound<-data, they should go and read Nouritza Matossian’s book on Iannis Xenakis. Listen to a lot of his music, reflect on what was done in the 50s and 60s, then check out what little we know about Pierre Boulez’s secretive matrix manipulations. Think before you convert, and then stand on the shoulders of giants when you do so.

1 Comment

Filed under Uncategorized

Need to do a simulation study on a Bayesian model? Use Stan.

I’ve been looking into a particular Bayesian meta-analysis model of late. Can’t tell you any more right now of course, but I wanted to check that it was throwing up sensible results and then compare it to classical MA methods. Bring out the simulation study! The trouble is, if you run this sort of thing in BUGS or JAGS, especially because there are correlated parameters between the baseline and endpoint, and between the overall and study-specific stats, it’s going to be slow to fit one of the models, never mind 10,000 of them.

I switched to Stan once I got the basic spec up and running, and it was a decision I have not regretted for one moment. The advantage here is that, not only does Stan use an algorithm that will easily cope with highly correlated parameters (see figure below), but it is compiled into a machine-code program which takes your data and runs the NUTS algorithm to get the chains. So, one you’ve invested the time (about 30-60 seconds for me) to compile the program, plugging in new data to the same model is super-fast. We’re talking a tenth of a second or just under for 1000 steps. So to run 10,000 simulated datasets through it, with 2 chains each of 1000 steps, would be a bit of a nightmare in BUGS, but takes 30 minutes in Stan. Now things that were prohibitively time-consuming become possible!

Figure 7 from Hoffman & Gelman's 2011 paper introducing NUTS

Figure 7 from Hoffman & Gelman’s 2011 paper introducing NUTS: note how NUTS gets much closer to the true distribution (under “independent”) of these 2 (out of 250!) correlated dimensions in parameter space than Metropolis or Gibbs in the same number of steps.

The R interface “rstan” is really good too, so I can do everything in one R code file: generate the data, run the model, get the results, look for non-convergence, work out bias, coverage and MC error, and draw some graphs. All while I go and enjoy a cappuccino. I think it’s fair to say that Stan is my new best friend in the stats playground.

2 Comments

Filed under Bayesian, R

Advanced MCMC in Bristol

I went on this course a couple of years ago and it was really good. Remember it will have that distinctly Bristolian flavour, i.e. multilevel models to the fore. If you’re into multilevel models you should also head over and check out their Stat-JR software which is now in its v0.2 release. The addition of easily specified starting values is one that will be welcome for many MLwiN users!
Course: Advanced multilevel modelling using Markov chain Monte Carlo (MCMC), 9-11 April 2014, University of Bristol

This workshop will cover background theory and application of MCMC methods for multilevel modelling. We will focus on multilevel model classes that benefit from MCMC estimation including discrete response models (e.g., binary, ordinal and nominal outcomes), cross classified models, multiple membership models and multivariate response models with missing data. We will also showcase methods within MLwiN to speed up the MCMC estimation and demonstrate in Stat-JR its interoperability features with other MCMC packages such as WinBUGS.
This workshop is designed for researchers who already have a good knowledge of both continuous and discrete response multilevel models and have used MLwiN before. It is not designed for beginners, who we advise to attend an introductory workshop instead.

Instructors: Professor William Browne, Professor Harvey Goldstein, Professor Kelvyn Jones, Dr George Leckie, Dr Richard Parker

For further information and to make an application, please go
Please note the final date for applications is 20 February 2014 OR EARLIER if the number of applicants greatly exceeds the number of available places.

Leave a comment

Filed under Uncategorized

How to convert odds ratios to relative risks

My short paper on this came out on Friday in the British Medical Journal. The aim is to help both authors and readers of research make sense of this rather confusing but unavoidable statistic, the odds ratio (OR). The fundamental problem is that quoting the odds in group A, divided by the odds in group B, confuses most people because we just don’t think in terms of odds.

The home-made video abstract on the BMJ website shows you the difference between odds and risk, and how one odds ratio can mean several different relative risks (RRs), depending on the risk in one of the groups. Unfortunately, in some situations, you just have to get an OR, notably logistic regression and retrospective case-control studies.

The bottom line is that authors should present RRs if they can, and with excellent software like margins and marginsplot in Stata, and effects in R, there’s really no excuse not to do this, even for complex models. In particular, I’m a huge fan of the plots of marginal probabilities from these packages, which help you to show the complex patterns in your data to an audience that will run scared from tables of ORs, interaction terms and confidence intervals. John Fox’s 2003 paper is still worth reading.

For readers, it can be harder, because you only have the information in the paper. Anyone who has done a systematic review will know what I mean – the baseline stats are given in Table 1 and the ORs in Table 2 or 3, and without any idea of the risk in one of the groups, you can proceed no further. However, I would suggest there is still hope. If you can get a range of plausible risks for the control group, you can work out a range of plausible relative risks. The formula is:

RR = OR / (1 – p + (p x OR))

where p is the risk in the control group. I’ve given a ready-reckoner table in the BMJ paper.

OR-RR conversion

And one more subtlety, if I may. As we’ve seen, a statistical model with a single shared OR for everyone (take this pill and your odds of a heart attack go down by 10%) does not imply a shared RR for everyone. If the logistic regression included adjustment for confounders, or if the case-control study was matched, then those other factors will warp the RR for different subgroups of people. This is where the plausible range comes in handy as well.

It sounds a bit Bayesian, but is really just a sensitivity analysis. However, a Bayesian meta-analysis could look at trials reporting ORs with little or no other supporting information , as well as trials reporting RRs, and combine them to get a pooled RR, if it included a shared prior for control group risks. This is one of my next projects…

3 Comments

Filed under R, Stata

A room full of Julians

Despite winter rain, I was delighted to head uptown last week to Skills Matter on the old Goswell Road for the first ever London Julia meetup. The first thing I learnt was that Julia’s friends are called Julians.

If you don’t know it yet, Julia is a pretty new (v 0.3 is current) programming language for fast numerical computing. Everything is designed from the ground up for speed, by some very clever people. They claim speeds consistently close to compiled machine code, which is generally the upper limit, like the speed of light. But a few facts make it potentially Revolutionary Computing: you don’t have to compile it before running, you can mess about in a command-line interface to learn it, it’s free and open source, you can directly call C functions from inside normal Julia code – and vice versa, and the syntax is LISP-ish and light as eiderdown (there are some nice comparative examples of this on the homepage).

Arise, ye Julians

Arise, ye Julians

The focus was on getting started, and the room was packed. Personally, I spent some time playing with it last year and then let it lapse, but now with v0.3 out there it seems to be time to get back up to speed.

For stats people, there are a few important packages to install: Distributions, Stats, DataFrames, HypothesisTests, and possibly Optim, MCMC, depending on your own interests. That’s all pretty straightforward, but when you start up Julia or load one of the packages like this:

using(HypothesisTests)

it takes a noticeable while to get ready. This is an artefact of the just-in-time compiler and open source programming. Almost all of the packages and the standard library are written in Julia itself. When you first need it, it gets compiled, and after that it should be superfast. Apparently a package is on the way to supply a pre-compiled standard library, to increase startup speeds.

Here’s a little power simulation I tried out afterwards:

using(HypothesisTests)
starttime=time()
nsig=0;
for (i in 1:100000)
 xx=140+(15*randn(10));
 yy=135+(15*randn(10));
 sig= pvalue(EqualVarianceTTest(xx,yy))<0.05 ? 1 : 0;
 nsig = nsig+sig;
end
time()-starttime

This does 100,000 simulations of independent-samples t-tests with sample size 10 per group, means 140 and 135, and SD 15, and took 5.05 seconds on a teeny weeny Samsung N110 ‘netbook’ with 1.6GHz Atom CPU and 1GB RAM (not what you would normally use!) once the package was loaded.

In R, you could do this at least two ways. First a supposedly inefficient looped form:


Sys.time()
nsig<-0
for (i in 1:100000) {
 xx<-rnorm(10,mean=140,sd=15)
 yy<-rnorm(10,mean=135,sd=15)
 if(t.test(xx,yy)$p.value<0.05) {
 nsig<-nsig+1
 }
}
Sys.time()
print(nsig)

Next, a supposedly more efficient vectorized form:


tp<-function(x) {
 return(t.test(x[,1],x[,2])$p.value)
}
Sys.time()
nsig<-0

xx<-array(c(rnorm(1000000,mean=140,sd=15),
 rnorm(1000000,mean=135,sd=15)),
 dim=c(100000,10,2))
pp<-apply(xx,1,tp)
ppsig<-(pp<0.05)
table(ppsig)
#nsig<-sum(apply(xx,1,tp)<0.05)
Sys.time()
print(nsig)

In fact, the first version was slightly quicker at 2 minutes 3 seconds, compared to 2 minutes 35. While we’re about it, let’s run it in Stata too:

</pre>
clear all
timer on 1
set obs 10
local p = 0
gen x=.
gen y=.
forvalues i=1/1000 {
qui replace x=rnormal(140,15)
qui replace y=rnormal(135,15)
qui ttest x==y, unpaired
if r(p)<0.05 local p = `p'+1
}
dis `p'
timer off 1
timer list
<pre>

That took 30 seconds so we’re looking at 50 minutes to do the whole 100,000 simulations, but Stata black belts would complain that the standard language is not the best tool for this sort of heavy duty number-crunching. I asked top clinical trial statistician Dan Bratton for some equivalent code in the highly optimised Mata language:


timer clear 1
timer on 1
mata:
reps = 100000
n = (10 \ 10)
m = (140 , 135)
s = (15 , 15)
pass = 0
for (i=1;i<=reps;i++) {
 X = rnormal(10,1,m,s)
mhat = mean(X)
 v = variance(X)
df = n[1]+n[2]-2
 t = (mhat[1]-mhat[2])/sqrt((1/n[1]+1/n[2])*((n[1]-1)*v[1,1]+(n[2]-1)*v[2,2])/df)
p = 2*ttail(df,t)
if (p<0.05) pass = pass+1
}
pass/reps
end
timer off 1
timer list 1

… which clocked in at 7 seconds. I’m not going to try anything more esoteric because I’m interested in the speed for those very pragmatic simulations such as sample size calculations, which the jobbing statistician must do quite often. (Actually, there is an adequate approximation formula for t-tests that means you would never do this simulation.)

That time difference surprised me, to say the least. It means that Julia is an option to take very seriously indeed for heavy-duty statistical calculations. It really isn’t hard to learn. However, I don’t know of any scientific papers published yet that used Julia instead of any more established software. Perhaps the version 0.x would worry editors and reviewers, but surely v1.0 is not far away now.

14 Comments

Filed under Julia, R, Stata

2014/5 Global Clinical Scholars’ Research Training from Harvard medical school

And in the interests of fairness, here is the other one. Full details and application forms online.

 

We are currently accepting applications for the 2014 – 2015 GCSRT Program.  Applications will be accepted from October 1, 2013 until the June 2, 2014 final registration deadline.

APPLICATION REQUIREMENTS: Applicants must hold an MD, PhD, MBBS, DMD, DDS, DO, PharmD, DNP, or equivalent degree.  

The following documents are required to apply for the program:

  •  Online Application
  •  Current Curriculum Vitae / Résumé
  •  Personal Statement (one page)
  •  Letter of Recommendation (from a department / division head, director, chair or supervisor)

Applicants should have their updated CV or résumé and personal statement ready to attach to the application. The letter of recommendation may be submitted with your application or submitted thereafter by email.

Only completed applications will be considered for acceptance.

Leave a comment

Filed under learning

Fully funded Masters by Research for NHS health care professionals

Here comes a plug for the course I spent most of my teaching time working with:

FULLY FUNDED STUDY OPPORTUNITY – APPLY NOW FOR 2014 / 2015 ENTRY

Master’s of Research in Clinical Practice – Funded Placement Opportunities for Nurses, Midwives, Pharmacists and Allied Health Professionals

Kingston University and St George’s, University of London’s Faculty of Health, Social Care and Education are offering 18 fully funded studentships to NHS professionals.

 This programme of study funded by the National Institute for Health Research (NIHR) and Chief Nursing Officer for England (CNO), is a central part of the Government’s drive to modernise clinical academic careers for nurses, midwives, pharmacists and allied health professionals.

The inter-professional programme provides practical and academic study to give health professionals the skills to manage and deliver research in a clinical setting and prepare them for careers in clinical research. Throughout the course students gain the appropriate knowledge of contemporary professional research practices and develop skills that enable them to generate research questions, test data collection approaches and interpret results within a scientific framework.

By the end of the course healthcare professionals will be equipped with the skills needed to participate fully as a clinical practice researcher whether through engagement in research, debate and discussion, by adopting an evidence based approach to practice, presenting at clinical meetings and conferences, or by publishing their work in clinical journals.

The Faculty is currently recruiting for September 2014 entry. There are options to access either a fully funded full time (one year) or part time (two year) studentship.

Nurses, midwives, pharmacists and allied health professionals sited in England with at least one year’s clinical experience and a 2 (i) honours degree in a health or social care-related subject are eligible to apply. Funding covers basic salary costs and course fees, allowing employers to seek reimbursement (via invoicing arrangements) of employment costs during the period of secondment.

Further information about the course with full details of entry requirements and how to apply are available online at: http://www.sgul.ac.uk/courses/postgraduate/taught/clinical-practice-mres

The closing date for applications is 16th May 2014 by 5pm. Interviews will be held on 4th June 2014.

 Applicants and line managers are invited to attend any of the postgraduate open evenings scheduled on 17th February 2014, 10th March 2014, 10th April 2014 and 7th May 2014. At all events there will be an opportunity to meet the course director, past students and learn more about the programme and the process of selection. Details of which are available at:http://www.sgul.ac.uk/courses/postgraduate/open-evenings. To register attendance please contact pgadmiss@sgul.ac.uk

Leave a comment

Filed under learning