Trump Wants NSA Program Reauthorized But Won’t Tell Congress How Many Americans It Spies On

You're thinking that they actually need all that data for everyone, copies of every video everyone has ever watched.

Let's say someone watches a video on youtube. How much data does NSA need? If I'd be in charge, there would be a system that would do OCR + voice-to-text recognition on any video that people who are tagged as political adversaries watch. The extracted text would be cross-compared with dangerous keywords and phrases through semantic analysis. The result would be a database of what dangerous ideas and topics different parts of the video has. Then, by recording the metadata about who watches that video and what parts they watched of it, I get a clear picture to what ideas these people have been exposed to. I then record at what parts do the friends of the user watch and when. This gives me metadata about importance of friends, their interests and so on. This is all fairly easy to do. All this data would be searchable from Marina / KXeyscore, along with statistics about this data produced with tools by e.g. Palantir.

The same can be done to any data we can turn into text, e.g. phone calls. But is it feasible to record all content for deeper and deeper analysis and cross-referencing? We know for a fact NSA has speech to text recognition systems https://theintercept.com/2015/05/11/speech-recognition-nsa-best-kept-secret/

So if all 325 million US citizens were monitored when they spoke on the phone 24/7 to foreigners for 15 years (NSA's data retention time):

At the speaking rate is 163 words per minute, and average number of letters per word in english language being 5.1 that can be stored with ASCII PT = 1 / byte/char

325_000_000*15*365.25*24*60*163*5.1 = 2.131 Exabytes of plaintext data.

Forbes states that NSA's Mission Data Repository in Utah holds 1,000,000 exabytes of data.

https://www.forbes.com/sites/kashmirhill/2013/07/24/blueprints-of-nsa-data-center-in-utah-suggest-its-storage-capacity-is-less-impressive-than-thought/#806940074575

That's 500,000 times more than the theoretical maximum amount of domestic content over the phone. It's unlikely that anyone of us are exposed to more information than that anyway, so they can store all relevant data about what information what citizens receive and share, plus even more.

But what about if every US citizen was as fast as Anne Jones, the fastest reader in the world at 4251WPM and they would consume infinite pile of dangerous books by Noam Chomsky at that rate.

325_000_000*15*365.25*24*60*4251*5.1 = 55.59 Exabytes of plaintext data.

So the recording of content of everyone's toughts, what they read, what they speak is easily within the capacity of these databases. To put it into dollars, the storage on LTO-7 tapes at 6TB / $25 is ridiculously low at 231 million. Of course you need the servers, supercomputers, tape drive loading systems etc. but it's not in anyway unachievable.

/r/technology Thread Parent Link - theintercept.com