Imputing ethnicity from names

Interesting paper testing different software to assign probabilities to ethnic groups based on a person’s name. Bottom line: it doesn’t work… yet. If I heard of another Robert Grant I might have these priors: Scottish 40%, English 30%, Caribbean 20%, Irish 5%, Mixed 5%.

They put the resulting distributions into a multiple imputation, which in theory will capture the uncertainty in the distribution. The problem is that the software never assigned anyone to “Chinese / Other”, and Rubin’s clever rules (see missingdata.org.uk) aren’t clever enough to get you out of that mess. Back to the drawing board.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s