New Regulations to Increase Transparency in AI Training Data
The European Union’s AI Act aims to enforce transparency on the data used for AI training, shedding light on one of the industry’s most closely guarded secrets. Over the next two years, businesses will need to comply with new obligations, balancing innovation and regulation despite resistance from AI firms citing trade secrets.
A New Era of AI Regulation
A new set of laws governing the use of artificial intelligence (AI) in the European Union will force companies to be more transparent about the data used to train their systems. This move is aimed at addressing industry secrecy and copyright concerns, ensuring a fair balance between innovation and regulation.
In the 18 months since Microsoft-backed OpenAI unveiled ChatGPT to the public, there has been a surge of public engagement and investment in generative AI. This technology rapidly produces text, images, and audio content, raising questions about how AI companies obtain the data used to train their models. Concerns have been raised over whether feeding these systems with bestselling books and Hollywood movies without permission breaches copyright laws.
Key Provisions of the AI Act
The EU’s AI Act, rolled out in phases over the next two years, gives regulators time to implement the new laws while businesses adjust to new obligations. One of the more contentious sections of the Act requires organizations deploying general-purpose AI models, like ChatGPT, to provide detailed summaries of the content used for training. The newly established AI Office plans to release a template for organizations to follow in early 2025, following a consultation with stakeholders.
AI companies are resistant to revealing their training data, citing it as a trade secret that could give competitors an unfair advantage. Matthieu Riouf, CEO of AI-powered image-editing firm Photoroom, compared it to a secret recipe that chefs wouldn’t share. The level of detail required in these transparency reports will significantly impact both smaller AI startups and major tech companies like Google and Meta.
Balancing Trade Secrets and Copyright
Over the past year, prominent tech companies, including Google, OpenAI, and Stability AI, have faced lawsuits from creators claiming their content was improperly used to train AI models. While U.S. President Joe Biden has passed executive orders focused on AI security risks, copyright issues remain unresolved. Calls for tech companies to compensate rights holders for data have gained bipartisan support in Congress.
Amid growing scrutiny, tech companies have signed content-licensing deals with media outlets and websites. OpenAI signed agreements with the Financial Times and The Atlantic, while Google struck deals with NewsCorp and Reddit. Despite these moves, OpenAI faced criticism for its lack of transparency regarding the use of YouTube videos in training its tools.
Thomas Wolf, co-founder of AI startup Hugging Face, supports greater transparency but acknowledges that much remains undecided. Senior lawmakers in the EU remain divided on how to balance trade secrets with the need for transparency.
Future Implications and Industry Reactions
Dragos Tudorache, a key lawmaker in drafting the AI Act, believes AI companies should disclose their datasets to ensure creators can verify if their work was used in training. A Commission official highlighted the need to balance protecting trade secrets and allowing copyright holders to exercise their rights under Union law.
Under President Emmanuel Macron, the French government has expressed concerns about rules that could hinder European AI startups’ competitiveness. French finance minister Bruno Le Maire emphasized the need for Europe to innovate before regulating AI technologies.
The AI Act represents a significant step towards greater transparency in the AI industry, aiming to protect copyright holders while fostering innovation and competition.