Needless haystacks

Michał "rysiek" Woźniak

13.04.2015

Languages:

English

This is an ancient post, published more than 4 years ago.

As such, it might not anymore reflect the views of the author or the state of the world. It is provided as historical record.

I find that in most situations where any mishap is involved, especially with any large institutions in the picture, Hanlon’s razor tends to apply, and is a good working model to base assumptions on.

This has been the case with most Internet censorship debates in Poland, for instance. Assuming malice really wasn’t helping to get our point across.

Of needles and haystacks¶

This is why I am flabbergasted with NSA’s (and the rest of the gang, too) insistence on gathering as much data as they can. Sure, for most regular Jacks or Jills, “you need the haystack to find the needle” might sound about right. A bit more observant person might however do a double-take: “wait, what?”. When I’m searching for a needle, the last thing I want or need is an ever-larger haystack. Something’s fishy.

Then, they might go the extra mile and dig a bit, finding out that NSA’s data has no real impact on anti-terrorism efforts. Maybe they’ll even dig out a 2007 Stratfor report on the “obstacles to the capture of Osama”, pointing out things like:

[T]he Taliban and al Qaeda so far have used their home-field advantage to establish better intelligence networks in the area than the Americans.

And:

One big problem with this, according to sources, was that most of these case officers were young, inexperienced and ill-suited to the mission.

Or this gem:

This lack of seasoned, savvy and gritty case officers is complicated by the fact that, operationally, al Qaeda practices better security than do the Americans.

And while one of the sections of the report is indeed entitled “Needle in a Haystack”, it doesn’t exactly support the “we need the whole haystack” narrative of the NSA and it’s ilk. Because this narrative simply makes no sense. Why? Because math.

When we’re talking about searching large datasets for something, we need to account for false positives and false negatives. The larger the dataset, the larger a problem they become. But don’t take my word for it, Floyd Rudmin has written a great analysis of this back in 2006:

Suppose that NSA’s system is really, really, really good, really, really good, with an accuracy rate of .90, and a misidentification rate of .00001, which means that only 3,000 innocent people are misidentified as terrorists. With these suppositions, then the probability that people are terrorists given that NSA’s system of surveillance identifies them as terrorists is only p=0.2308, which is far from one and well below flipping a coin. NSA’s domestic monitoring of everyone’s email and phone calls is useless for finding terrorists.

That’s right. Even if we assume amazingly good accuracy, the agency has a better chance catching a terrorist by flipping a coin, than by actually using the data they gather.

Unknown knowns and competent incompetence¶

That’s exactly why I am flabbergasted: usually that would be the point where I’d call upon Hanlon’s razor. But we have just assumed that NSA is really, really competent in what they’re doing, and what they’re doing is, in no small part, math.

So either they are very, very competent and understand that mass surveillance cannot work the way NSA claims it is supposed to; or they are not competent enough to know this, but then all the more they lack the most basic skills to work with datasets they have. Can’t have it both ways!

The third way¶

The scary possibility is that NSA knows this full well, and yet they still gather the data. Why would they do this? Well, while it might not be all that useful to catching terrorists, it might be a game-changer in areas where the numbers are different. Again, Floyd Rudmin puts it best:

Also, mass surveillance of the entire population is logically plausible if NSA’s domestic spying is not looking for terrorists, but looking for something else, something that is not so rare as terrorists. For example, the May 19 Fox News opinion poll of 900 registered voters found that 30% dislike the Bush administration so much they want him impeached. If NSA were monitoring email and phone calls to identify pro-impeachment people, and if the accuracy rate were .90 and the error rate were .01, then the probability that people are pro-impeachment given that NSA surveillance system identified them as such, would be p=.98, which is coming close to certainty (p_1.00).

So are the NSA and other security agencies too incompetent to understand mass surveillance is useless for its stated purpose, or are they competent enough to understand it and the real purpose is just a bit different?

Neither possibility makes me feel safer. Or be safer, for that matter.

Songs on the Security of Networks
a blog by Michał "rysiek" Woźniak

Needless haystacks

Tags:

Languages:

Of needles and haystacks¶

Unknown knowns and competent incompetence¶

The third way¶