ELI5: Why are voice assistants privacy risks?

I'm going to present a rather ridiculous worst-case scenario, and let you work out the more mundane possibilities.

We know that the NSA, through programs dubbed PRISM and MUSCULAR, had access to internal Google networks. They didn't need to intercept your Gmails or your Google Docs "off the wire" as you read them, or defeat the encryption between your browser and Google's server; instead, the NSA was tapped directly into Google's back-end storage and could access the raw data there. Google countered, or at least claimed to have done, by encrypting its internal network in 2013 to thwart NSA access to users' data. But an article more than a year later revealed that the NSA still has cozy partnerships with Google and other tech companies. There's a chance that the NSA can still access Google users' data, whether Google is cooperating or not.

When you pick up your phone and say "OK Google, give me tomorrow's forecast," your voice is converted into text which gets processed by a Google computer somewhere, looking for the answer. But with the exception of easy commands like "set alarm for 7AM," your voice doesn't get processed on your phone. A recording of your voice is sent to a Google server, which runs the speech-to-text conversion on their end and then processes your query. What happens to that recording of your voice? Is it stored? If so, for how long? Who else besides Google might have access to it, and make their own copy? I find it reasonable that these voice recordings can be accessed and stored by the government. That's only speculation, I don't have any proof, but it's within the realm of what they've been proven to do in the past.

Suppose over the course of the next month, you interact with your phone a bit:

  • OK Google, how many kids does [politician name] have?

  • OK Google, what wine goes good with salmon?

  • OK Google, find lesbian porn!

  • OK Google, what's the best way to kill ants in the kitchen?

  • OK Google, what year was the Battle of Little Bighorn?

Now there are recordings of you saying a bunch of words, all of which make up completely innocent phrases by themselves. But someone with access to these clips can produce new audio of you saying things like "find porn with little kids" and "what's the best way to kill [politician name]?" Recordings like that could be used to blackmail you for money. To convince you not to run for city council, or to make you stop criticizing the police on the internet. They could be played for your wife to turn her against you, or leaked to the news if you're a public figure. They may stand up to forensic analysis and could be used to put you on trial for something you never said or did. And what's your defense? These clips are truly of your voice, because you said the constituent words.

While the technology to do this exists, it's all very far-fetched. You and I are nobody. The government isn't going to dig through our "OK Google" archives and fabricate embarrassing fake recordings. At least not until some powerful politician gets his feelings hurt over a few tweets, and decides to make an example out of us. Never happen, right?

/r/privacy Thread