Categories: Cyber Security

More than 130,000 Claude, Grok, ChatGPT, and Other LLM Chats Readable on Archive.org


A researcher has found that more than 130,000 conversations with AI chatbots including Claude, Grok, ChatGPT, and others are discoverable on the Internet Archive, highlighting how peoples’ interactions with LLMs may be publicly archived if users are not careful with the sharing settings they may enable.

The news follows earlier findings that Google was indexing ChatGPT conversations that users had set to share, despite potentially not understanding that these chats were now viewable by anyone, and not just those they intended to share the chats with. OpenAI had also not taken steps to ensure these conversations could be indexed by Google.

“I obtained URLs for: Grok, Mistral, Qwen, Claude, and Copilot,” the researcher, who goes by the handle dead1nfluence, told 404 Media. They also found material related to ChatGPT, but said “OpenAI has had the ChatGPT[.]com/share links removed it seems.” Searching on the Internet Archive now for ChatGPT share links does not return any results, while Grok results, for example, are still available. 

Dead1nfluence wrote a blog post about some of their findings on Sunday and shared the list of more than 130,000 archived LLM chat links with 404 Media. They also shared some of the contents of those chats that they had scraped. Dead1nfluence wrote that they found API keys and other exposed information that could be useful to a hacker.

“While these providers do tell their users that the shared links are public to anyone, I think that most who have used this feature would not have expected that these links could be findable by anyone, and certainly not indexed and readily available for others to view,” dead1nfluence wrote in their blog post. “This could prove to be a very valuable data source for attackers and red teamers alike. With this, I can now search the dataset at any time for target companies to see if employees may have disclosed sensitive information by accident.”

404 Media verified some of dead1influence’s findings by discovering specific material they flagged in the dataset, then going to the still-public LLM link and checking the content.

💡
Do you know anything else about this? I would love to hear from you. Using a non-work device, you can message me securely on Signal at joseph.404 or send me an email at joseph@404media.co.

Most of the companies whose AI tools are included in the dataset did not respond to a request for comment. Microsoft which owns Copilot acknowledged a request for comment but didn’t provide a response in time for publication. A spokesperson for Anthrophic, which owns Claude, told 404 Media: “We give people control over sharing their Claude conversations publicly, and in keeping with our privacy principles, we do not share chat directories or sitemaps with search engines like Google. These shareable links are not guessable or discoverable unless people choose to publicize them themselves. When someone shares a conversation, they are making that content publicly accessible, and like other public web content, it may be archived by third-party services. In our review of the sample archived conversations shared with us, these were either manually requested to be indexed by a person with access to the link or submitted by independent archivist organizations who discovered the URLs after they were published elsewhere across the internet first.” 404 Media only shared a small sample of the Claude links with Anthrophic, not the entire list.

Fast Company first reported that Google was indexing some ChatGPT conversations on July 30. This was because of a sharing feature ChatGPT had that allowed users to send a link to a ChatGPT conversation to someone else. OpenAI disabled the sharing feature in response. OpenAI CISO Dane Stuckey said in a previous statement sent to 404 Media: “This was a short-lived experiment to help people discover useful conversations. This feature required users to opt-in, first by picking a chat to share, then by clicking a checkbox for it to be shared with search engines.”

A researcher who requested anonymity gave 404 Media access to a dataset of nearly 100,000 ChatGPT conversations indexed on Google. 404 Media found those included the alleged texts of non-disclosure agreements, discussions of confidential contracts, and people trying to use ChatGPT for relationship issues.

Others also found that the Internet Archive contained archived LLM chats.

storshop.dk@gmail.com

Share
Published by
storshop.dk@gmail.com

Recent Posts

Archivists Let You Now Read Some of the First Ever Reviews of Mario and Zelda

Some of the first reviews ever written for the original Legend of Zelda and Super…

3 hours ago

Trump Is Launching an AI Search Engine Powered by Perplexity

Donald Trump’s media company is teaming up with Perplexity to bring AI search to Truth…

23 hours ago

ICE Is Buying Mobile Iris Scanning Tech for Its Deportation Arm

Immigration and Customs Enforcement (ICE) is looking to buy iris scanning technology that its manufacturer…

23 hours ago

Million-Year-Old Evidence of Epic Journey Near ‘Hobbit’ Island Discovered by Scientists

Scientists have discovered million-year-old artifacts made by a mysterious group of early humans on the…

1 day ago

Constitution Sections on Due Process and Foreign Gifts Just Vanished from Congress’ Website

Congress’ website for the U.S. Constitution was changed to delete the last two sections of…

1 day ago

Home Depot and Lowe’s Share Data From Hundreds of AI Cameras With Cops

Hundreds of AI-powered automated license plate reading cameras paid for by Lowe’s and Home Depot…

1 day ago