Monthly Archives: June 2013

Meet Julia

Julia is a very new open-source, high-level language from some terribly clever people at MIT and elsewhere, aimed at scientific computing. It achieves a very fast performance through a Just-In-Time (JIT) compiler. The language itself is quite intuitive, being superficially similar to Matlab. I have been playing around a little with it, and I think it is going to make a massive impact when it takes off. A lot of my work involves statistical models that can take a long time to fit, so anything to improve speed is very welcome. Julia has the advantage of speed – only slightly inferior to writing low-level code in C++ (for example) and compiling it, but with much more concise syntax. If you can handle R then you can handle Julia. Parallel processing is designed in from the beginning, and you can call C libraries to save reinventing the wheel. I mean, what else could you want?

There are some videos of rapid introductions on their YouTube channel. Try this intro to data frames, for example:

Now there is also a cloud-based trial called “Try Julia“, hosted by Forio. They are working towards a (presumably commercial but there are places for keen beta-testers) parallel cloud implementation called Mandelbrot (get it?). You really, really should go and look at this site. Do it now! There is a step by step tutorial that introduces you and Julia to each other (including regression and simulation). Then, if the two of you you get along, why not download the latest version (and it is worth, at this early stage, keeping up with the latest as there are lots of fixes coming through all the time). Julia has clearly started off in Linux and been produced as a Cygwin type of shell for Windows. I encountered some problems with loading packages with the last version but as of 0.2 it seems to be fixed.

If you are like me and can’t resist a bit of MCMC, check out doobwa’s code. It’s amazing that programmers are getting moving with this so fast!

Leave a comment

Filed under Uncategorized

Slides for a talk on software for Bayesian analysis

Here are some slides for an overview talk I gave earlier today on software for Bayesian analysis. What I should really do is embed some videos of screen capture demonstrations… one day.

Result: happiness

Featured: BUGS, JAGS, Stan, calling each of these from R, and some musings on C++, Julia and GPGPU.

Leave a comment

Filed under advanced

Things to do in Wokingham when you’re prematurely dead

The recent release by Public Health England of a ranking of local authorities in terms of premature death rates has attracted lots of media attention. I have to say that the interactive presentation is really clear and nice, but there is one big problem for me: no acknowledgement of uncertainty. The number one spot for Wokingham with 200.3 deaths per 100,000 population is presented as fact, and certainly it is true for 2009-2011 when the data were collected. Some of you may be interested in blaming your local politicians / health service for the last three years, but others will look at this chart and wonder what it tells you about the place where you live, or are thinking of moving to, now and indeed in the future. Surprisingly little, I would suggest.

Ideally, we would have a time series of premature deaths in successive years over a long time, so we could see the direction of travel. Are the rates coming down in one place or up in another? That would be more interesting than these single numbers. Then we could also tell whether the poor rates in Blackpool and Manchester in 2009-11 were just a statistical blip or a long-term trend. But even without long-term data we can do better than a single number.

It’s not an ideal thing to do (see Leckie and Goldstein “The limitations of school league tables to inform school choice”), but we can use the same tools we have for inference from a random sample. There is an amazingly simple formula for the uncertainty around a rate and I just quickly ran off 95% confidence intervals for each LA and plotted them in the graph below. The 95% confidence interval is based on the idea that, if everything stayed the same (that’s the unrealistic bit), future years would see the LAs’ rates fluctuating, sometimes a bit higher due to bad luck and sometimes a bit lower due to good luck. This particular interval should contain 95% of those future hypothetical rates. Or, to flip it around (and not every statistician would agree with me that this is equivalent), you can also say that there is a 95% chance that the rate in your LA will be within that interval this year.

This "caterpillar plot" has vertical lines that show the degree of uncertainty in each local authority's premature death rate. When they overlap, that suggests they are more likely to have swapped places in 2012 or 2013.

This “caterpillar plot” has vertical lines that show the degree of uncertainty in each local authority’s premature death rate. When they overlap, that suggests they are more likely to have swapped places in 2012 or 2013.

The overlap among the authorities is not an ideal way to judge the uncertainty in the league table, but it’s not a bad place to start. If your authority overlaps with its neighbour, then you could swap places simply by chance, without any change in the population or the public health activities. A more intuitive way of looking at it is to consider what this uncertainty in the rates could do to the ranks in the league table, and we can easily do this by simulating possible future rates from the 95% confidence intervals for each LA. I ran this for 10,000 probable futures and here’s the 95% confidence intervals for the ranks:

In this version of the graph, the rank in the league table is shown vertically. Uncertainty is much greater because LAs are very similar and could easily swap places.

In this version of the graph, the rank in the league table is shown vertically. Uncertainty is much greater because LAs are very similar and could easily swap places.

That’s a lot of uncertainty. So here is an alternative table that shows where your LA is in the Longer Lives table and also how low or high it might be this year, just by chance (remember – this assumes nothing has changed!) As an example in how to read it, my home is in Croydon (number 58 in 2009-11) but we could be as good as number 45 (currently occupied by Havering) or as bad as 70 (Hounslow). That span of 25 local authorities is not unusual and covers one-sixth of England.

LA Rank Lower 95% CI Higher 95% CI
1 Wokingham 1 1 8
2 Richmond upon Thames 2 1 11
3 Dorset CC 3 1 12
4 Surrey CC 4 1 12
5 South Gloucestershire 5 1 12
6 Rutland 6 1 14
7 Harrow 7 2 14
8 Kensington and Chelsea 8 2 16
9 Bromley 9 2 16
10 Hampshire CC 10 3 16
11 Kingston upon Thames 11 4 18
12 West Berkshire 12 5 24
13 Buckinghamshire CC 13 6 24
14 Windsor and Maidenhead 14 6 25
15 Cambridgeshire CC 15 6 27
16 Barnet 16 6 28
17 Suffolk CC 17 8 29
18 Bath and North East Somerset 18 8 29
19 Devon CC 19 11 31
20 Wiltshire 20 11 33
21 Hertfordshire CC 21 12 33
22 Oxfordshire CC 22 12 33
23 West Sussex CC 23 14 34
24 Poole 24 14 37
25 Solihull 25 14 37
26 Somerset CC 26 15 37
27 Bexley 27 17 38
28 Sutton 28 18 40
29 Merton 29 18 44
30 Leicestershire CC 30 18 45
31 Enfield 32 18 47
32 Gloucestershire CC 31 20 47
33 Central Bedfordshire 33 21 49
34 North Yorkshire CC 34 24 49
35 Essex CC 35 24 50
36 Shropshire 36 25 50
37 Bracknell Forest 37 25 50
38 Cheshire East 38 27 51
39 Norfolk CC 39 27 52
40 Redbridge 40 27 52
41 Warwickshire CC 41 27.98 54
42 Worcestershire CC 42 28 54
43 East Riding of Yorkshire 43 28 54
44 Herefordshire, County of 44 29 55
45 Havering 45 29 57
46 Cornwall 47 31 57
47 Westminster 46 33 58
48 East Sussex CC 48 33 59
49 Isle of Wight 49 36.98 60
50 North Somerset 50 37 60
51 Hillingdon 51 37 61
52 Brent 52 37 61
53 Kent CC 53 40 61
54 York 54 40 65
55 Staffordshire CC 55 40 65
56 Derbyshire CC 56 42 67
57 Swindon 57 44 68
58 Croydon 58 45 70
59 Cheshire West and Chester 59 45 71
60 Wandsworth 60 47 71
61 Trafford 61 49 73
62 Nottinghamshire CC 62 50 73
63 Lincolnshire CC 63 52 74
64 Milton Keynes 64 54 75
65 Camden 65 57 77
66 Northumberland 66 57 78
67 Bournemouth 67 57 78
68 Southend-on-Sea 68 57.98 79
69 Ealing 69 58 79
70 Hounslow 70 60 79
71 Thurrock 71 61 80
72 Northamptonshire CC 72 61 82
73 Waltham Forest 73 64 83
74 Dudley 74 64 84
75 Stockport 75 65 84
76 Cumbria CC 76 65 86
77 Bedford 77 65 88
78 Reading 78 67 90
79 Haringey 79 67 94
80 Medway 80 68 95
81 Sheffield 81 69 97
82 Warrington 82 70 99
83 North Lincolnshire 83 71 102
84 Torbay 84 74 102
85 Greenwich 85 77 104.02
86 Plymouth 86 77 107
87 Peterborough 87 78 107
88 Hammersmith and Fulham 88 78 108
89 Rotherham 89 79 108
90 Bristol, City of 90 80 109
91 Kirklees 91 80 109
92 Sefton 92 82 109
93 Redcar and Cleveland 93 82 110
94 Darlington 94 83 111
95 Southampton 95 83 111
96 Telford and Wrekin 96 84 112
97 North Tyneside 97 84 112
98 Brighton and Hove 98 84 113
99 Bury 99 85 114
100 Leeds 100 85 115
101 Derby 101 85 115
102 Stockton-on-Tees 102 86 115
103 Lancashire CC 103 86 116
104 Portsmouth 104 87 116
105 County Durham 105 87 117
106 Lewisham 106 88 117
107 North East Lincolnshire 107 88 118
108 Luton 108 88 120
109 Slough 109 88 121
110 Wakefield 110 88 123
111 Walsall 111 91 125
112 St. Helens 112 93 125
113 Wirral 113 94 126
114 Doncaster 114 94 127
115 Southwark 115 96 128
116 Newham 116 96 128
117 Calderdale 117 99 128
118 Islington 118 102 129
119 Birmingham 120 104 129
120 Barnsley 119 106 130
121 Lambeth 121 107 131
122 Bradford 122 108 131
123 Gateshead 123 109 131
124 Bolton 124 109 133
125 Wolverhampton 125 111 133
126 Coventry 126 112 133
127 Wigan 127 115 134
128 Hackney 128 115 135
129 South Tyneside 129 116 137
130 Newcastle upon Tyne 130 118 139
131 Hartlepool 131 118 140
132 Sunderland 132 120 142
133 Barking and Dagenham 133 123 142.02
134 Halton 134 127 143
135 Leicester 135 128 143
136 Sandwell 136 129 143
137 Tower Hamlets 137 129 143
138 Stoke-on-Trent 138 130 144
139 Oldham 139 131 144
140 Rochdale 140 131 144
141 Nottingham 141 132 144
142 Tameside 142 132 145
143 Blackburn with Darwen 143 133 145
144 Knowsley 144 134 146
145 Middlesbrough 145 138 147
146 Kingston upon Hull, City of 146 143 148
147 Salford 147 145 148
148 Liverpool 148 145 148
149 Blackpool 149 149 150
150 Manchester 150 149 150

Leave a comment

Filed under Uncategorized

R2leaflet (v0.1) – make interactive online maps from R

I have been working on a simple R function to take latitude and longitude of points of interest, and text for pop-up labels, and produce an interactive online map. Interactive graphics are incredibly useful in getting people interested in your work and communicating your data effectively, but very few statisticians / data analysts have the skills needed to make them. My aims are in this order:

  1. encourage more data people to find out about JavaScript and teach themselves some basic skills
  2. give you a really easy way of making an online interactive map

Introducing JavaScript is what it’s all about for me, so I have aimed it at newcomers. The function is called R2leaflet because it uses the popular JavaScript package leaflet to construct the map. It just takes your data and constructs a lot of text around it to form a new HTML file which contains your map. You could just upload the whole thing if you know nothing about HTML (and don’t mind having a page with a map and nothing else!), or if you are a web whizz then you will know which bits to copy and paste into your own page. It also includes a comment against each line, and the aim of that is to encourage even casual users to open the HTML file in a text editor and see how it works.

It is at a very early stage, and I have a long list of things to do to improve it, but I would value any feedback on it. You can download it from here and then load it into R by typing:
source("R2leaflet.R")

The code is at Github if you want to see how it is made and collaborate. My to-do list is there too. I encourage anyone interested in this to join me and contribute so we can make a whole suite of simple R2JS visualization functions!

Let’s look at a simple example. Below is a screenshot of the map which links to my website where you will find the real interactive version. This is simply because this blog is hosted on WordPress.com and they do not allow any JavaScript from the blogger.

mapsnap1

Here’s how it was made:

lat<-c(51.426694, 51.523242, 54.008727, 54.977433)
long<-c(-0.174015, -0.119959, -2.785206, -1.607604)
label<-c("St George's Hospital",
"UCL Institute of Child Health <br> Harvard GCSRT venue",
"Lancaster University",
"University of Northumbria <br> RSS 2013 venue")

R2leaflet(lat=lat,long=long,label=label,filename="mymap.html")

The defaults have been retained for some other arguments: map.lat and map.long, which center the map, map.zoom, which zooms in and out, and popup, which is a Boolean vector of the same length as lat, long and label, which is TRUE if you want a popup and FALSE if you don’t.

There are some neat R functions out there to make latitude and longitude from various co-ordinate systems. If you need to look up latitude and longitude for a specific few points, this website is handy. Notice how the line breaks in the popups are produced by <br>, which an HTML tag.  Want to format your text inside the popup? You can simply add HTML tags to the character vector called ‘label’. If you don’t know about HTML tags, visit this page and you’ll soon be <impressive>a web coding fiend</impressive>.

Remember that any vectors you supply to the arguments ‘lat’, ‘long’, ‘label’ and ‘popup’ will be recycled R-style if they are not the same length, with potentially disastrous results in your map! The function returns nothing at present, but you can easily change it to select from the various parts of text written to the file in a return command if that would be useful for you.

8 Comments

Filed under R

Quantified Self on BBC radio 4

Just by chance I heard this half-hour programme tonight. Although it didn’t provoke many new thoughts for a professional statistician, I thought it was well-made and engaging, and worth a listen, despite a wearily sceptical tone. Depending on how far in the future you are reading this, and how far from Broadcasting House you are, you might be able to “listen again”.

Invisible airwaves crackle with data

As one of the talking heads said at one point, it’s a bit of a misnomer, because the quantification is not what’s interesting, it’s the patterns you wouldn’t otherwise have noticed, and how they make you reflect on how you live your life. The one thing that I have collected, my GPS location every 5 minutes, has made me reflect on how many days are spent going from home to office and back again, while the map makes it unquestionably clear that I actually live in one of the world’s most famous and exciting cities but rarely go into its famous and exciting bits, and that actually might make me live my everyday life a little bit differently. Thanks, data.

Leave a comment

Filed under Uncategorized