The recent publications in the Guardian about NSA access to Google, Facebook, Yahoo and all the rest have been met with a flurry of leftist abhorrence and mutterings from the twittering, as opposed to the twitter, classes. However, all reports refer to access to data, in such a way as to make people think that their personal emails and photos and so on are being read by the NSA, or their computers – this is not the case. Because the media is governed by soundbites, polemic and an absence of nuance, the headline is that the security services have access to data. The truth is, they don’t need or want access to the data. They need the models. And this allows the internet companies to deny they are granting access to the data.
Think about it from the advertising perspective first of all. Google has a relationship with advertisers where it facilitates their placement of ads in front of appropriate eyes. Google’s consumer business is constructed in such a way as to yield sufficient data that it can build mathematical models that allow it to predict who is likely to be the right person to receive any particular ad. This is about patterns. So people who search for lamps are likely to want to buy lamps, and therefore likely to want to click on an ad for lamps. At no point does Google disclose to the advertiser the identity of the consumer; the consumer self-identifies to the merchant once they click through to the advertiser’s website, either through cookies or an express declaration (such as login or registration).
Now, searching for lamps is only one indicator. There are thousands of other things that Google and others know about their customers. If they send emails to lots of people about lamps, that impacts the mathematical model – they could be lamp retailers, and therefore looking for quantity, for example. Crucially, their buying pattern is likely to be different, and the Google algorithms may determine that this user is more appropriate for an ad for a lamp wholesaler rather than a lamp retailer. Further, they may be searching from a particular location – if they are in a street, maybe they’re looking to buy something locally; if they are at home (determined as the place where they usually are at 2am most nights, and ‘discovered’ through Google Maps or some other location service on a mobile device) then perhaps they are looking to replace a broken light. The more data they have, the more accurate their models, and the more effective the execution of the advertising process.
For the advertiser, none of this is transparent. The only thing they are interested in acquiring is the eyeballs of people who are likely to want to buy whatever it is they are selling. They don’t get ‘data’ from Google; they get results. They get people visiting their website and buying stuff, or otherwise engaging with their product or service.
Now think about this in a National Security context. The NSA aren’t interested in data, they’re interested in the models, the people who are likely to do bad things. Ultimately of course they will be interested in finding out specific IP addresses and the content of emails and so on, but think about this in a two step process. First, they work with Google – or whoever – to figure out the patterns for people who are likely to want to do bad things. We know from analyzing other bad people that they tend to behave in certain ways. We know, for example, that they tend to mention particular keywords – like jihad, for example – and perhaps search for guns, and are perhaps in particular locations. We know perhaps that they tend to have particular social network characteristics – such as an intense one to one relationship with a particular source, or bilaterals with several. Whatever the characteristics, these are the kinds of people that they are trying to identify. They do not need to have Google provide the data, they just need a tool – similar to AdSense, Google’s flagship advertising product – to do the analysis. Then, once they have identified the key targets, they can get ‘official’ approval to request the details for that person, or list of persons.
There is no privacy compromised in this process; but that is not because information is not disclosed. It is because privacy as we have known it simply does not exist in a big data world. The models, the maths, are all that count.