Dear CIO,

Back in 2010, a 24-year-old programmer and activist named Aaron Swartz wrote a simple script that downloaded academic papers from JSTOR, a digital library that archives scholarly articles. Swartz was not trying to make money. He believed that research funded by the public should be accessible to the public. However, for downloading approximately 70 GigaBytes of data, federal prosecutors came after him with a vengeance. Swartz was hit with multiple felony charges, faced up to 35 years in prison, and a million-dollar fine. The pressure was overwhelming. In January 2013, he took his own life.

Now, a little over a decade later, the scale of data downloading has exploded, but the legal and ethical consequences seem to look very different. In today’s newsletter, we are going to look at ethical AI implementation.

Best Regards,
John, Your Enterprise AI Advisor

Dear CIO

Aaron Swartz and the Ethics AI Forgot

Today, some of the largest tech companies in the world are scraping and consuming data at a scale that makes Swartz’s efforts look like a drop in the bucket. Instead of prosecutions, though, they are rewarded with multi-billion-dollar valuations.

Take Meta, for example. When it set out to train its language model, LLaMA, it pulled in a staggering 81.7 terabytes of data. That’s more than a thousand times what Swartz downloaded. A large chunk of that data came from pirate libraries like LibGen and Z-Library, massive online repositories of copyrighted books. Additionally, the internal documents show Meta engineers were well aware of the legal risks, openly referring to their training data as a “pile of pirated books.” Yet, ultimately, a major lawsuit was dismissed, not because Meta was cleared of wrongdoing, but on a legal technicality that leaves the core ethical question unanswered.

Meta is not the outlier here, though. It is just playing the same game everyone else is. OpenAI has faced similar allegations. Lawsuits, including one from the Authors Guild, argue that OpenAI built its breakthrough models on copyrighted texts scraped from the internet without consent. The exact makeup of OpenAI’s training data remains a tightly held secret, but the accusations mirror those facing Meta.

Then there is Anthropic, the company that brands its Claude AI as the more ethical alternative. Behind the scenes, it has been accused of feeding pirated content into its models. Court documents in an ongoing class-action lawsuit claim Anthropic used millions of copyrighted books, also pulled from sources like LibGen and Books3. In this case, the judge has already ruled that using pirated content for commercial AI is, in fact, illegal. Anthropic is now heading to trial in December 2025, with potential damages reaching up to $100 billion.

The contrast is staggering. Aaron Swartz was treated like a criminal for downloading academic research. Meanwhile, tech giants download entire libraries and face lawsuits they treat as little more than operational risk.

At its core, this is about how we value data, who gets to control it, and what justice looks like in a world run by code and capital. The early internet was built on the idea that knowledge should be shared. However, today’s AI race has transformed the open web into a data goldmine for corporations.

What is really at stake here is not just legal precedent or business strategy. It is the future of creativity, authorship, and ownership in the digital age. Suppose the courts allow this kind of mass appropriation to stand. In that case, we are setting a dangerous precedent that anything published online is fair game for commercial exploitation, so long as an algorithm processes it.

Now, I am a big believer in the potential of AI. I have spent years watching this space evolve, and I genuinely think these technologies can transform how we work, learn, and solve problems. However, with that belief comes a responsibility to ask hard questions, especially about how these models are built. If the foundation of this innovation is built on ethically murky or outright unlawful use of creative work, we have to confront that.

For CIOs, this means you can not afford to treat ethical questions as someone else’s problem. You are the steward of your organization's digital future. How these models are deployed begins with your leadership, and if your enterprise is building systems based on appropriated data, then it also becomes your responsibility. As AI becomes embedded into every part of the enterprise, CIOs must ensure the systems reflect not only technical excellence but ethical alignment.

How did we do with this edition of the AI CIO?

Login or Subscribe to participate

Deep Learning
  • Aaron Fulkerson counters Microsoft's claim by asserting that Confidential Computing already enables verifiable EU data sovereignty.

  • James Wickett identifies six AI vendor red flags that signal a lack of transparency and credibility.

  • Andrew Boyagi dives into the Atlassian State of Developer Experience report ranking 2025’s top developer productivity killers.

  • Kevin Townsend looks at IBM’s 2025 Cost of Data Breach Report, revealing that 13% of breaches now involve company AI systems.

  • Dinis Cruz exposes critical security gaps in MCP and GenAI agents and urges ethical hackers to exploit unsafe features to drive overdue technological and behavioral change.

  • Ravie Lakshmanan reports on researchers uncovering an AI-generated malicious npm package that deployed a stealth crypto wallet drainer across OS platforms.

  • Robert Lemos criticizes LLM-generated code for ongoing security flaws despite improved compile success.

  • Ivan Nardini launches the experimental Vertex Gen AI Eval SDK for Python.

  • The Artificially Intelligent Enterprise looks at how to use AI as your strategic thought partner.

  • AI Tangle covers Gemini 2.5 Deep Think, Apple’s new AI team, and OpenAI raising another $8.3 billion.

Regards,

John Willis

Your Enterprise IT Whisperer

Follow me on X

Follow me on Linkedin

Dear CIO is part of the AIE Network. A network of over 250,000 business professionals who are learning and thriving with Generative AI, our network extends beyond the AI CIO to Artificially Intelligence Enterprise for AI and business strategy, AI Tangle, for a twice-a-week update on AI news, The AI Marketing Advantage, and The AIOS for busy professionals who are looking to learn how AI works.

Keep Reading

No posts found