Imputing ethnicity from names

Interesting paper testing different software to assign probabilities to ethnic groups based on a person’s name. Bottom line: it doesn’t work… yet. If I heard of another Robert Grant I might have these priors: Scottish 40%, English 30%, Caribbean 20%, Irish 5%, Mixed 5%.

They put the resulting distributions into a multiple imputation, which in theory will capture the uncertainty in the distribution. The problem is that the software never assigned anyone to “Chinese / Other”, and Rubin’s clever rules (see aren’t clever enough to get you out of that mess. Back to the drawing board.


