OpenAI Introduces ‘GPTBot’ Web Crawler With GPT-5 Model On The Horizon
OpenAI ChatGPT users can scrape the web crawler by adding a “disallow” command to a standard server file.
The “GPTBot” web crawling tool, developed by artificial intelligence company OpenAI, has been made available. According to the company, it may enhance future ChatGPT models.
According to a new blog post by OpenAI, “web pages crawled with the GPTBot user agent may potentially be used to improve future models,” which might increase the accuracy and capabilities of subsequent iterations.
A form of bot that indexes the content of websites all across the internet is a web crawler, often known as a web spider. They enable websites to appear in search results on search engines like Google and Bing.
According to OpenAI, the web crawler will gather material freely available online. Still, it will exclude sources that demand payment for access, are known to collect personally identifiable information, or contain text that is against its rules.
It should be mentioned that website owners can prevent the web crawler from accessing their pages by including a “disallow” command in a shared server file.
Instructions to “disallow” GPTBot for ChatGPT users. Source: OpenAI
Three weeks after the company submitted a trademark registration for “GPT-5,” the projected replacement for the current GPT-4 model, the company released a new crawler.
The application covers the usage of the term “GPT-5,” which comprises software for AI-based human speech and text, translating audio into text, and voice and speech recognition, which was submitted to the USPTO on July 18.
Observers may not want to wait for the upcoming ChatGPT version. Sam Altman, the creator, and CEO of OpenAI, stated in June that the company is “nowhere close” to commencing training GPT-5 and provided a list of safety audits that must be completed first.
Concerns around copyright and consent have been raised recently regarding OpenAI’s data collection methods.
In June, Japan’s privacy agency warned OpenAI against gathering private information without consent, while in April, Italy temporarily outlawed the usage of ChatGPT after asserting that it had violated some EU privacy regulations.
In late June, 16 plaintiffs filed a class action against OpenAI, alleging that the AI firm acquired confidential information via ChatGPT user conversations.
If these charges are accurate, OpenAI and Microsoft, who was named as a defendant, will violate the Computer Fraud and Abuse Act. This legislation has precedent in web-scraping cases.