OpenAI Is Not Training on Your Dropbox Documents—Today

There’s a rumor flying around the Internet that OpenAI is training foundation models on your Dropbox documents.

Here’s CNBC. Here’s Boing Boing. Some articles are more nuanced, but there’s still a lot of confusion.

It seems not to be true. Dropbox isn’t sharing all of your documents with OpenAI. But here’s the problem: we don’t trust OpenAI. We don’t trust tech corporations. And—to be fair—corporations in general. We have no reason to.

Simon Willison nails it in a tweet:

“OpenAI are training on every piece of data they see, even when they say they aren’t” is the new “Facebook are showing you ads based on overhearing everything you say through your phone’s microphone.”

Willison expands this in a blog post, which I strongly recommend reading in its entirety. His point is that these companies have lost our trust:

Trust is really important. Companies lying about what they do with your privacy is a very serious allegation.

A society where big companies tell blatant lies about how they are handling our data—­and get away with it without consequences­—is a very unhealthy society.

A key role of government is to prevent this from happening. If OpenAI are training on data that they said they wouldn’t train on, or if Facebook are spying on us through our phone’s microphones, they should be hauled in front of regulators and/or sued into the ground.

If we believe that they are doing this without consequence, and have been getting away with it for years, our intolerance for corporate misbehavior becomes a victim as well. We risk letting companies get away with real misconduct because we incorrectly believed in conspiracy theories.

Privacy is important, and very easily misunderstood. People both overestimate and underestimate what companies are doing, and what’s possible. This isn’t helped by the fact that AI technology means the scope of what’s possible is changing at a rate that’s hard to appreciate even if you’re deeply aware of the space.

If we want to protect our privacy, we need to understand what’s going on. More importantly, we need to be able to trust companies to honestly and clearly explain what they are doing with our data.

On a personal level we risk losing out on useful tools. How many people cancelled their Dropbox accounts in the last 48 hours? How many more turned off that AI toggle, ruling out ever evaluating if those features were useful for them or not?

And while Dropbox is not sending your data to OpenAI today, it could do so tomorrow with a simple change of its terms of service. So could your bank, or credit card company, your phone company, or any other company that owns your data. Any of the tens of thousands of data brokers could be sending your data to train AI models right now, without your knowledge or consent. (At least, in the US. Hooray for the EU and GDPR.)

Or, as Thomas Claburn wrote:

“Your info won’t be harvested for training” is the new “Your private chatter won’t be used for ads.”

These foundation models want our data. The corporations that have our data want the money. It’s only a matter of time, unless we get serious government privacy regulation.