How to stop the internet becoming a junk heap

In the Jorge Luis Borges short story “The Library of Babel”, the narrator describes a library of apparently infinite size, composed of hexagonal galleries filled with bookshelves. The books contain every possible configuration of 22 letters and three punctuation marks. The narrator describes a lifetime searching through them, hoping to find coherence and, in the end, finding only a few lines that make any sense.

“The Library of Babel” is a horror story. On the shelves are every conceivable truth, the solution to every problem, the answer to every question—but also corruptions of those truths, falsehoods impossible to tell from the truths and a near-infinite quantity of sheer nonsense. It is not a new observation that the internet resembles the library of Babel. For decades, individuals have been publishing online whatever they have seen fit to share—whether it’s profound truths, falsehoods, or merely incoherent junk.

The problem of internet junk is about to get much, much worse. In the late 1990s, the rise of the Google search engine revolutionised how people found their way round the sense and nonsense that is the global internet. Searching online requires a different strategy than searching in books. Before Google, most digital search engines relied on a simple heuristic of looking for webpages where terms the user was searching for appeared with high frequency. Want to find bicycle reviews? Search for a document that uses the phrase “bicycle reviews”! But this doesn’t work in a world where everyone can publish and where there are economic incentives to capture people’s attention. These early search engines were vulnerable to individuals posting pages that just said “bike reviews” thousands of times.

Google refined the process by adding in the idea of “authority”. For a page to appear near the top of its results, many other pages needed to point to it via hyperlinks. The theory behind the PageRank algorithm created by Larry Page and Sergey Brin was that authoritative pages would be the destination of lots of organic links on the web, while very few people would choose to link to a page that repeated a keyword tens of thousands of times.

When Google first appeared, it was a revelation. Search results were vastly better. Before long, though, “link farmers”—who often call themselves “search engine optimisation experts”—figured out how to fool Google, too: by creating farms of pages that point to one another. Bobsbikereviews.com could now have 10,000 pages pointing to it, with 10,000 pages pointing to each of them, and so on. Google has evolved, and now has methods designed to evade this form of search engine optimisation. But the task is becoming vastly more difficult—and recent developments in artificial intelligence have only made things harder.

How do we respond when content is being created not for our benefit, but to fool search engines?

ChatGPT, a system that generates text that’s difficult to distinguish from human-authored text, is creating a perfect storm for search engines. For years, people have tried to rig Google by mass-posting handcrafted spam. Most is repetitive and easily ignored by Google and its competitors. But now it is becoming far easier to create masses of high-quality content and post it online in order to direct people’s attention towards pages laden with ads or deceptive offers. Search engine giants are already working on this problem, looking for signatures that pages were created automatically, then penalising them. What is likely to happen is an escalating war between AI-generated pages and algorithms designed to help search engines sort real human knowledge from artificial junk.

Unfortunately, even if Google can learn to sort between the authentic and the fake, humans may still struggle. Remember the Internet Research Agency (IRA)? A building in St Petersburg was filled with people whose job was to create social media posts promoting Putin’s agenda and aggravating political tensions in the US. The IRA claimed, as one of its successes, the creation of two rival groups in Texas: one a right-wing populist group pushing for state secession and championing gun rights, the other a faith group, United Muslims of America, which campaigned for Hillary Clinton. In a remarkable feat of dezinformatsia, these two Facebook groups, both controlled by the Russians, managed to bring dozens of real Houstonians out onto the streets to protest against one another.

Running the IRA required paying hundreds of tech-savvy English-literate Russians to build online personas and create several posts a day in their voices. That process is now fully automated. We should expect social media platforms like Facebook and Twitter to fill up with automatically generated propaganda promoting the points of view of controversial political figures.

Unfortunately, it is difficult for people to navigate a landscape in which an enormous amount of the content they are exposed to appears to favour one point of view. The natural tendency, when bombarded with posts claiming the invasion of Ukraine is legitimate, is to wonder whether your support for Kyiv is misinformed or ill-considered. Do these apparently ordinary Russians and apparently pro-Putin Europeans have a point?

It will be a huge challenge to keep these new junk accounts in check—and unfortunately, platforms have all the wrong incentives when it comes to combatting the problem. Elon Musk, witnessing the fallout from his mismanagement of Twitter, may welcome the advent of these robots hosting controversial and high-engagement content, just so long as his advertisers do not complain that they are wasting money in selling ads to ChatGPT-empowered robots.

How do we respond when content is being created not for our benefit, but to fool search engines or promote extreme points of view? I recently had a preview of one possible answer with a system called the Otherweb, created by AI programmer Alex Fink. The Otherweb attempts to sort through the news of the day and delete “anti-news”. Anti-news is content created by professional news organisations that has no actual news value—his favourite example is a headline from a credible source that read, “Stop what you’re doing and watch this elephant play with bubbles”. This sort of content is created by humans to grab attention: it doesn’t provide useful information about the world, though it might be diverting for a period of time.

Anti-news is Fink’s bête noire, and he has devoted substantial thought to creating a news stream devoid of clickbait and other forms of anti-news. Each day I now receive a newsletter from the Otherweb that has distilled thousands of news stories down to nine selected for their apparent neutrality and newsworthiness. The system works extremely well: in a few moments I get a quick overview of newsworthy headlines with no attempts to capture and redirect my attention.

There is an irony in seeking the help of AI to help find our way through a landscape filled with junk created by rival systems of AI. We might have avoided this problem had OpenAI, the creators of ChatGPT, been more responsible in the way it released its tool to the public. It seems likely that users will be able to use ChatGPT or something similar in the very near future, creating an endless stream of junk that can be harnessed either for search engine optimisation or the generation of propaganda. Here’s hoping we quickly see innovation in tools that help us fight back.

We might also benefit from rethinking the incentives that make the current internet work. Spam is a function of an ad-supported internet with constant competition for users’ attention. If we worked on something closer to a subscription model, material would have to be higher quality for users to be willing to pay for it. And if systems like Reddit did not reward users simply for creating content that people happen to engage with, they would have fewer incentives to inflate their post count by publishing junk.

Perhaps there is a way to build incentives that reward high-quality engagement and strongly penalise people for posting AI-generated junk. But for now it seems likely that this battle for our attention will head further into surreal, Borgesian territory as we navigate an infinite series of hexagonal galleries online, armed with tools to help us find those increasingly rare nuggets of genuine human insight.