The NYT Lawsuit, OpenAI, and the Quiet Shift in Data Retention Risk
- Kafico Ltd
- 2 days ago
- 2 min read
Updated: 7 hours ago
There's a lot being written about the New York Times v. OpenAI case, but one consequence seems to be slipping under the radar, and it might matter more to commissioners of AI tools than they realise.

One of OpenAI’s main defences is that it doesn’t know exactly what data went into its training set, or at least, that it can’t identify it after the fact. The models don’t store training data verbatim in neat, retrievable chunks; they’re statistical systems. This “black box” nature is what allows OpenAI to argue that it can’t be expected to purge or exclude NYT content after the fact, because the training data can’t be reverse-engineered cleanly.
But that very defense may be prompting courts and critics to demand more transparency and traceability, and that, in turn, could mean pressure on OpenAI and others to retain more data going forward. To show how a model was trained, or to prove something wasn’t used, you need logs. You need provenance. And that starts to shift the risk profile.
Now, to be clear: OpenAI has said that its “zero data retention” API contracts are unaffected. If your supplier is using OpenAI’s enterprise-tier API with retention switched off, your prompts aren’t stored, and they aren’t used to train the model. That’s still true — and it’s an important reassurance.
But we shouldn’t stop there. Because:
Not all suppliers are using that tier. Some say “we use OpenAI” and leave it at that. Unless you ask directly, you may be assuming retention protections that aren’t actually in place.
Legal pressure may shift the defaults, especially if discovery obligations or regulatory inspections start requiring more detailed logging — even temporarily.
Zero retention isn’t necessarily zero exposure. If your supplier is caching inputs locally or using a logging tool before hitting the OpenAI API, or if they ever shift back to a different tier, your data may be more exposed than the marketing suggests.
The NYT case might not just reshape copyright, it could quietly reshape data retention expectations across the AI ecosystem. And that has downstream implications for privacy, procurement, and due diligence.
As commissioners, we can’t just ask where the data goes. We also need to ask how the legal environment might change where it’s expected to go next,
and make sure we’ve got supplier contracts, DPIAs, and incident response plans that don’t assume a stability that may no longer exist.
Comments