Welcome to readin – the best world tech news chanel.

What to know about tech companies using AI to teach their own AI| GuyWhoKnowsThings


OpenAI, Google and other technology companies train your chatbots with huge amounts of data drawn from books, Wikipedia articles, news and other Internet sources. But in the future they hope to use something called synthetic data.

This is because technology companies can exhaust the high-quality text that the Internet has to offer for the development of artificial intelligence. And companies face copyright lawsuits authors, news organizations and computer programmer for using his works without permission. (In one of those lawsuits, The New York Times sued OpenAI and Microsoft.)

They believe that synthetic data will help reduce copyright issues and increase the supply of training materials needed for AI. Here's what you should know about it.

It is data generated by artificial intelligence.

Yes. Instead of training AI models with text written by people, tech companies like Google, OpenAI, and Anthropic hope to train their technology with data generated by other AI models.

Not quite. AI models make mistakes and to invent things. They have also shown that Collect the biases that appear in the Internet data from which they have been trained.. So if companies use AI to train it, they may end up amplifying their own flaws.

No. Technology companies are experimenting with it. But because of the potential flaws of synthetic data, it's not a big part of how AI systems are built today.

Companies believe they can perfect the way synthetic data is created. OpenAI and others have explored a technique where two different AI models work together to generate synthetic data that is more useful and reliable.

An AI model generates the data. A second model then judges the data, much like a human would, deciding whether the data is good or bad, accurate or not. AI models are actually better at judging text than writing it.

“If you give two things to technology, it's pretty good to pick which one looks better,” said Nathan Lile, CEO of artificial intelligence startup SynthLabs.

The idea is that this will provide the high-quality data needed to train an even better chatbot.

Something like. It all comes down to that second AI model. How good are you at judging texts?

Anthropic has been the most vocal about its efforts to make this work. It fine-tunes the second AI model using a “constitution” selected by the company's researchers. This teaches the model to choose text that supports certain principles, such as freedom, equality, and brotherhood, or life, liberty, and personal safety. Anthropic's method is known as “constitutional AI”

Here's how two AI models work together to produce synthetic data using a process like Anthropic's:

Still, humans are needed to ensure the second AI model stays on track. That limits the amount of synthetic data this process can generate. And researchers disagree about whether a method like Anthropic's will continue to improve AI systems.

The AI ​​models that generate synthetic data were in turn trained on human-created data, much of which was copyrighted. Therefore, copyright holders can still argue that companies like OpenAI and Anthropic used copyrighted text, images, and videos without permission.

Jeff Clune, a computer science professor at the University of British Columbia who previously worked as a researcher at OpenAI, said AI models could ultimately become more powerful than the human brain in some ways. But they will do it because they learned from the human brain.

“To borrow from Newton: AI sees further by relying on giant human data sets,” he said.


Share this article:
you may also like
Next magazine you need
most popular

what you need to know

in your inbox every morning