Large language models are trained on huge amounts of data, including the web. Google is now calling for “Machine-readable means of selecting and controlling web editors for emerging AI and search use cases” or an updated robots.txt file.
Google says web publishers having “choice and control” over their content is an important part of maintaining a vibrant ecosystem. Shows how robots.txt files allow sites to control whether search engines can crawl and index their content.
However, we understand that existing controls in web editors were developed before newer AI and search use cases.
As such, Google wants to bring together “web publishers, civil society, universities, and many more from around the world” to discuss the modern equivalent of robots.txt for AI training. Note how this community evolved web standard, which is about 30 years old, was “simple and transparent”.
Today, the company has the generative search experience, Bard, and is actively training Gemini, its foundational next-generation model.
Google wants a general discussion with a file Sign up from Today, allow groups to express interest before they go: “The mailing list is for members of the web and AI communities who want to receive future posts regarding the process of developing new machine-readable means of providing choice and control to web publishers. »
It will be a question of “bringing together those interested in participating in the months to come”.
Learn more about Google AI:
FTC: We use affiliate links to generate revenue. more.
“Zombie-loving evangelist. Incurable Creator. Proud pioneer of Twitter. Food lover. Internetaholic. Hardcore introvert. »