Salmond explains his Chinese name data analysis

Rob Salmond has posted further explanation for his data analysis of the Auckland property sales on his own Polity blog. His site has been mostly unloadable so I’ll post it entirely here as an alternate source.

How Labour estimated ethnicity from surnames

In response to requests via Twitter, this post walks readers through the general method Labour used to predict the ethnicity of Auckland house buyers from their surnames. This analysis was featured in the New Zealand Herald’s lead story yesterday.

Note that there are two points in this explanation where I will refuse to go into further detail, in order to protect Labour IP. The rest of this explanation has been made publicly in various venues already, so this post does not give away any new secrets.

Part 1: 2014 demographic study

Pre-election, Labour estimated the ethnicity of every person on the electoral roll, via standard Bayesian updating. There are 3.2 million people on the roll. This was one of many demographic estimates we did for everyone in the country. Most serious political parties now engage in this kind of demographic profiling.

To estimate ethnicity, we used public NZ census data on the ethnic distribution of neighbourhoods, and also used data we developed privately about the ethnic distribution of last, middle, and first names in New Zealand. We followed some advice – especially about estimating Asian ethnicities – from prominent US academic studies. I won’t be describing that process further, as that is sensitive IP for Labour.

Using these data, our base method was to estimate people’s ethnicity in a three-step Bayesian analysis:

  • Step 1: Prior: Neighbourhood ethnic distribution. New information. Lastname distribution. Posterior: Neighbourhood / lastname ethnic distribution.
  • Step 2: Prior: Neighbourhood / lastname ethnic distribution. New information. Firstname distribution. Posterior: Neighbourhood / lastname / firstname ethnic distribution.
  • Step 3: Prior: Neighbourhood / lastname / firstname ethnic distribution. New information. Middlename distribution. Posterior: Neighbourhood / lastname / firstname / middlename ethnic distribution.

This process provides a distribution of the likely ethnicities of each person in New Zealand, given their address and their full name.

The distribution covered the probability that a person was each of the following ethnicities, drawn from the level 1 and level 2 ethnic classifications from the New Zealand census: European, Maori, Pacific (not further defined), Pacific (Samoan), Pacific (Tongan), Asian (not further defined), Asian (Chinese), Asian (Japanese), Asian (Korean), Asian (South Asian), Asian (Middle East), other.

For the person-level point estimates, we used the largest single probability. That probability was typically above 0.9.

We refined these estimates further with three tweaks to account for moderate issues we encountered estimating certain ethnicities. I won’t be describing those tweaks further, because IP.

We then tested our predictions against a more-or-less-random sample of around 3,500 known New Zealanders for whom we had ethnicity data. Our best predictions, which we have used since, were 94.8% accurate.

This is an important point. Having developed our method for estimating ethnicity, we then tested it for accuracy against real world data. Only once we were satisfied it was accurate were we willing to rely on it in our work.

Part 2: Applying the predictions to housing data

To apply our general predictions, derived in part 1 above, to the Auckland housing data, we followed a two-step process.

First, we collapsed the 1.4 million Auckland-based ethnic estimates we had by surname only, as that is the only data we had in the real estate data. This allowed us to also partly leverage the earlier electoral roll-based information we gleaned from first names, middle names, and locations as part of our surname-based estimates.

Most of the surnames pointed strongly (pr>0.9) to one and only one ethnicity, although there were some examples with more mixed predictions. It created estimates such as the following (these are the real values):

Name pr(European) pr(Maori) pr(Chinese) pr(other)
JONES 0.938 0.054 0.001 0.007
HOTERE 0.048 0.887 0.000 0.065
LEE 0.481 0.027 0.400 0.092
LI 0.028 0.001 0.957 0.014

Having done that for each individual purchaser, we then summed the probabilities across all 3,922 sales in the dataset. This provided an aggregate estimate, based on the distributions of likely ethnicities in each individual sale, for the overall ethnic distribution of house buyers in Auckland.

In doing this aggregation, we tested various ways of accounting for the fact that some sales had one surname attached, while others had two or even three, accounting for multiple people with diffrerent surnames purchasing a property together. No matter how we cut those observations, the overall pattern remained within 2% of the numbers that appeared in the New Zealand Herald.

It is that overall distribution, not data cherry-picked from any particular sale, that we then compared with various other aggregate datasets about the ethnic distribution of Auckland residents, or various subsets of Auckland residents. Many of those comparisons are detailed in the Herald article and in my Public Address blog post yesterday.

Leave a comment

9 Comments

  1. I’m surprised they went to so much trouble to get something that did nothing to prove the proportion of overseas buyers or speculators – so much effort for so little useful conclusions.

    Reply
    • kittycatkin

       /  13th July 2015

      Someone hadn’t much to do with their time.

      Reply
    • Alan Wilkinson

       /  13th July 2015

      Of course it was a political exercise, not a scientific study. So the results are judged on column inches, not veracity and reliability.

      Reply
  2. The distribution covered the probability that a person was each of the following ethnicities, drawn from the level 1 and level 2 ethnic classifications from the New Zealand census: European, Maori, Pacific (not further defined), Pacific (Samoan), Pacific (Tongan), Asian (not further defined), Asian (Chinese), Asian (Japanese), Asian (Korean), Asian (South Asian), Asian (Middle East), other.

    I must be ‘other’ because none of the other ethnic options are applicable to me.

    Reply
    • kittycatkin

       /  13th July 2015

      Me, too. My parents were Others from Otherland. I suppose that yours were, too,

      I was surprised to see that Singh was a Chinese name, and so will anyone called that. Maybe these peopple think that Asian is a nationality.

      Reply
  3. kittycatkin

     /  13th July 2015

    These people. But I’d have to say that if we sold our house, the race of the buyer would be totally irrelevant, we’d sell to the highest bidder as anyone else would. It seems bizarre that Chinese people are so eager for our houses that they’ll pay way above the asking price-who would ??? If I wanted something that cost $100 and I knew that I could have it if I offered $101 for it, I wouldn’t offer $110. Nobody would.

    Reply
    • the theory is simple I think to explain this… I have 150 million in USD stuck in ABC land. I can happily live on 100 million USD. Hell 70 million USD I will live on if push ; ). I buy offshore assets using some allowable method of shifting funds out of ABC land… I need to do this very quickly to avoid officialdom in ABC land intervening or changing rules to stop funds being sent offshore…

      So as I don’t care if I lose 33% or even 50% of my initial amount I bid aggressively for the assets I buy and I over bid to make sure the transaction occurs every time I bid, with the over bid designed to frighten away other buyers.

      if I buy in a heated market like the AKL property market I may not lose anything when I resell anyway especially if I buy hold for a short period and then dump the asset back into an escalating market.

      Money has now been shifted to NZ or another country and is sitting in a bank account that if you’re smart can’t be easily traced by officials in ABC land if they decide they want the money back in ABC land because your a naughty boy…

      Reply
  4. traveller

     /  13th July 2015

    Why a Pommy immigrant like Twyford is fronting race-bashing wedge politics I’ll never know.
    I know I’ll be called for hyperbole and invoking Godwin’s law but this type of attack has all the hallmarks of pre-Nazi Germany. Painting an ethnic group as grasping our assets and depriving natives of their birthright is frightening.

    Reply
  5. Farmerpete

     /  13th July 2015

    I think Salmond is trying to polish a turd with this commentary. It won’t make the effluent any more acceptable!

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s