Links in this post were updated on 28 May 2013, and then again 25 April 2014.
My colleagues regularly collect data with postcodes for participants or organisations such as general practices. Often they want to add some geographical information against each of the postcodes to describe the area, and deprivation is one of the most common. This post talks you through how I go about adding deprivation to their data.
The simplest way is to use the GeoConvert online tool developed by the University of Manchester. You will need a UK academic institution’s username and password to access this. You can upload a comma- or tab-delimited file of your postcodes (or you can type in a single postcode) and it will return a .csv file that looks like this:
Or you could download all the postcodes for a local authority area. Recently, Her Majesty’s Department for Communities & Local Government released this online tool that spits out the data one authority at a time. Warning: it’s slow. It will give you ranks and not the IMD score itself (my advice: don’t worry, use the rank, the score is just a rough scale anyway). But why, in this age of Big Data, can’t they just make the whole dataset available online? It would probably work out lighter on their servers than them doing the crunching for us.
The more flexible approach is to download big data files and do the merging and matching yourself. First you have to know where to find the data you need. An excellent source of specific Census data is the UK Data Service Census website (formerly hosted by University of Manchester), and here you will also find deprivation datasets for the Carstairs and Townsend indices, as well as some regional ones. The Index of Multiple Deprivation (IMD) can be obtained in a spreadsheet from Her Majesty’s Department for Communities and Local Government (you might have to search for “Index of multiple deprivation” as I’m afraid we have a poor record in the UK for changing government URLs too often). These will typically be broken down by what they call ‘Super Output Area’ or ‘Lower Layer Super Output Area’, which is a standardised way of dividing the country up into small areas (lower layer is smaller) with codes for each. Here’s what the IMD file looks like:
Now you need to map the (L)SOAs into postcodes. First, you need to know the terminology. Take the postcode of St George’s Medical School, where I am typing this: SW17 0RE. There are four nested levels of detail. SW is the postcode area (South West London). SW17 is the postcode district (Tooting). SW17 0 is the postcode sector (a bit of Tooting about 1 mile across) and SW17 0RE is the postcode unit; most of these are a collection of about 20 houses but large organisations or blocks of apartments will have their own. There are also some postcode units that are not nested but have a special allocation to organisations that get massive amounts of mail such as TV Licensing; you shouldn’t come across those codes but be aware they exist. The Wikipedia page on UK postcodes is very useful for understanding it all.
Updated 28 May 2013: The most recent complete mapping of LSOAs (and also map co-ordinates) to postcodes can be downloaded old-skool fashion from Open Data Communities – this is the same data GeoConvert uses behind the scenes. Make sure you save it somewhere sensible. It is about 136MB in size compressed and well over 1GB uncompressed. However, it is no longer available in CSV but in N-triples, an online data format. The intention is not for you to download but to access it dynamically off the website each time, which is a lovely idea except you need to have some pretty advanced skills to do that, using a protocol called SPARQL. I don’t know of any stats software except R that has SPARQL capability; this tutorial blog post claims to get you up and running in 5 minutes. I have yet to try it myself although it sounds pretty cool…
If you have a few addresses with no postcode, you could look them up one by one through the Royal Mail’s web pages. Once you’ve done a few [hundred] of those you might find your scruples about hiring interns become eroded…
Now, you need to merge your IMD + LSOA data file into your LSOA + postcode data file, matching on LSOA. I won’t go into software but you can do this in any stats package or indeed in a relational database or possibly even in a spreadsheet using lookup functions. If you are only dealing with Tooting then just delete everywhere else before you start or you will get bogged down in some giga-computations. If you are working with the whole >1GB data file and you don’t have a huge chunk of RAM at your disposal, you will need to write a program to loop through bits of the country, do the merging, and then put the results together at the end. But here is a chance to quote R core developer Uwe Ligges at you:
RAM is cheap, and thinking hurts
i.e., go get a bigger computer. The aim is to get a file with one row for each postcode, giving you the IMD alongside (or whatever stats you want). Then you merge that file into your data, matching on postcode. Obviously you don’t want to keep any rows that refer to postcodes not in your data.
If you want to match not on postcodes but on larger areas like local authorities, you will find many more data files available online by local authority, or you could take weighted averages across SOAs. Boundaries of healthcare organisations like GP Clusters (or whatever they are called this week) are much, much harder to come by. Try calling people and asking favours…
One final observation: many experts regard the full postcode as patient-identifiable. Only collect it if you need it, justify it to the ethics committee, and manage the data accordingly.