Librarians Are Being Asked to Find AI-Hallucinated Books

Librarians Are Being Asked to Find AI-Hallucinated Books

Reference librarian Eddie Kristan said lenders at the library where he works have been asking him to find books that don’t exist without realizing they were hallucinated by AI ever since the release of GPT-3.5 in late 2022. But the problem escalated over the summer after fielding patron requests for the same fake book titles from real authors—the consequences of an AI-generated summer reading list circulated in special editions of the Chicago Sun-Times and The Philadelphia Inquirer earlier this year. At the time, the freelancer told 404 Media he used AI to produce the list without fact checking outputs before syndication. 

“We had people coming into the library and asking for those authors,” Kristan told 404 Media. He’s receiving similar requests for other types of media that don’t exist because they’ve been hallucinated by other AI-powered features. “It’s really, really frustrating, and it’s really setting us back as far as the community’s info literacy.” 

AI tools are changing the nature of how patrons treat librarians, both online and IRL. Alison Macrina, executive director of Library Freedom Project, told 404 Media early results from a recent survey of emerging trends in how AI tools are impacting libraries indicate that patrons are growing more trusting of their preferred generative AI tool or product, and the veracity of the outputs they receive. She said librarians report being treated like robots over library reference chat, and patrons getting defensive over the veracity of recommendations they’ve received from an AI-powered chatbot. Essentially, like more people trust their preferred LLM over their human librarian. 

“Librarians are reporting this overall atmosphere of confusion and lack of trust they’re experiencing from their patrons,” Macrina told 404. “They’re seeing patrons having seemingly diminished critical thinking and curiosity. They’re definitely running into some of these psychosis and other mental health issues, and certainly seeing the people who are more widely adopting it also being those who have less digital literacy about it and a general sort of loss of retention.” 

As a reference librarian, Kristan said he spends a lot of time thinking about how fallible the human mind can be, especially as he’s fielding more requests for things that don’t exist than ever before. Fortunately, he’s developed a system: Search for the presumed thing by title in the library catalog. If it’s not in the catalog, he checks the global library catalog WorldCat. If it isn’t there, he starts to get suspicious. 

“Not being in WorldCat might mean it’s something that isn’t catalogued like a Zine, a broadcast, or something ephemeral, but if it’s parading as a traditional book and doesn’t have an entry in the collective library catalog, it might be AI,” Kristan explained. 

From there, he might connect the title to a platform like Kindle Direct Publishing—one way AI-generated books enter the market—or the patron will tell him their source is an AI-powered chatbot, which he will have to explain, likely hallucinated the name of the thing they’re looking for. A thing that doesn’t exist. 

As much as library workers try to shield their institutions from the AI-generated content onslaught, the situation is and has been, in many ways, inevitable. Companies desperate to rush generative AI products to market are pushing flawed products onto the public that are predictably being used to pollute our information ecosystems. The consequences are that AI slop is entering libraries, everyone who uses AI products bears at least a little responsibility for the swarm, and every library worker, regardless of role, is being asked to try and mitigate the effects. 

Collection development librarians are requesting digital book vendors like OverDrive, Hoopla and CloudLibrary to remove AI slop titles as they’re found. Subject specialists are expected to vet patron requested titles that may have been written in part with AI without having to read every single title. Library technology providers are rushing to implement tools that librarians say are making library systems catalogs harder to use. 

Jaime Taylor, an academic library resource management systems supervisor with the University of Massachusetts, says vendors are shoehorning Large Language Models (LLMs) into library systems in one of two ways. The first is a natural language search (NLS) or a semantic search that attempts to draw meaning from the words to find complementary search results. Taylor says these products are misleading in that they claim to eliminate the need for the strict keyword searches or Boolean operators when searching library catalogs and databases, when really the LLM is doing the same work on the backend. 

“These companies all advertise these tools as knowing your intent,” Taylor told 404 Media. “Understanding what you meant when you put those terms in. They don’t know. They don’t understand. None of those things are true. There is no technical way these tools can do that.”

The other tool Taylor is seeing in library technology are AI-generated summaries based on journal articles, monographs, and other academic sources through a product called AI Insights, which incorporates new information into an existing LLM with a system called retrieval-augmented generation (RAG). Taylor and colleagues have found RAG doesn’t really help improve the accuracy of AI-generated summaries through beta testing AI tools in library tech for companies like Clarivate, Elsevier and EBSCO. 

“It reads everything on both pages,” she added. “It can’t tell where the article you’re looking for starts and stops, so it gives you takeaways from every word on the page. This was really bad when we tested it with book reviews, because book reviews are often very short and there’ll be half a dozen on one page, which would end up giving us really mixed up information about every book review on the page, even though the record we were looking at was only looking for one of them, because it was a scanned page from an older journal.”

Taylor says neither type of product is ready for market, but especially not the AI summaries that do what an abstract does but a lot worse. She’s turned what ones she can off, but expects fewer vendors will allow her and other librarians to do so in the future to record more favorable use cases. The problem, she says, is these companies are rushing products to market, making the skills academic librarians are trying to teach students and researchers to use obsolete. 

“We are trying to teach how to construct useful, exact searching,” she said. “But really [these products’ intent] is to make that not happen. The problem with that in a university library is we’re trying to teach those skills but we have tools that negate that necessity. And because those tools don’t work well, you’ve not learned the skill and you’re still getting crap results, so you’re never going to get better results because you didn’t learn the skill.” 

Plenty of library workers remain cautiously optimistic about the potential for generative AI integrations and what that could mean for information retrieval and categorization. But for most librarians, the rollout has been clunky, error-filled and disorienting, for them and their patrons. 

“As someone who feels like a big part of my job is advocacy for the position, for the principles of the profession, I am here to not look at whether a resource is good or bad,” said Kristan. “I don’t look at the output, the relationship that it has with the patrons, and what it’s being used for in the long run of the future. Like, I’m not out here just breaking looms and machine weaving machinery just for the hell of it. I’m saying this is not good for the community and we need to find equitable alternatives to ensure that things are going well for the lives of our patrons.”

Leave a Reply

Your email address will not be published. Required fields are marked *