The Culture Newsletter

What’s happened to the Internet Archive is shameful

This website enshrines the principle of equal access to information—and is crucial for students, writers and many others. So why are publishers fighting it?

September 12, 2024
article header image

A little over two years ago, I was near finishing my MPhil dissertation. Each day, I would go to the library, seeking its starched silence, its muted light, and sit down for ten hours or so. The majority of these were taken up with research, endless reading and Googling and scrolling and cross-referencing. I had assumed that the focus of my dissertation—female American writers of the 1970s—was near enough to the present to be fairly straightforward, research-wise; besides, I had a copyright library, complete with free interlibrary loans, at my disposal. I soon found that I was wrong. All too often, a quote, an article, a vague reference mentioned off-hand in some journal or review would promise to solve a key issue—if only I could find it.

Hours were lost raking through the library’s search facilities, through WorldCat, through the pelagic depths of Google Scholar searches. One such instance involved trying to find a book published in the early 1970s: the library, I was told, might have it, could have it, possibly in physical holdings, could perhaps order an interlibrary loan. In the end, they didn’t. They couldn’t. I despaired, told myself to move on—but I couldn’t either. My coursemate asked if I had checked the Internet Archive, a non-profit digital lending library. I signed up, found the book I wanted within five minutes, searched the text contents, found the exact quotes I needed.

From then on, the Internet Archive was my constant companion, my saviour. An independent feminist magazine from some time in 1973? They had it. An early edition of a novel that contained a different introduction from the version now in print? They had it. An NPR broadcast from 20 years ago? They had it. I could search the text, I could look things up at home; I could rediscover, redisseminate. There is a certain delight that comes with the discovery of access—I suppose you would call it freedom; the feel of the breeze through an opened window on a warm day. An inner world reaching outwards, an outer world reaching inwards.

I finished my MPhil, moved on from academia. Yet the Internet Archive has remained a constant source of both inspiration and information. When I wrote about the author Elaine Kraf, her novels long out-of-print, I used the Internet Archive to find editions of her works to refer to (leading to the reissuing of her novels). Frequently, I have used the resource to read books from forgotten authors of the 20th century (particularly female, experimental authors, who are often consigned to the backlist of history), consult introductions unavailable in later editions and read books only published in the US that would be prohibitively expensive to ship over. It has more prosaic uses, too: often, when writing articles or reviews, I have borrowed books through the Archive that I already own in physical form purely to more easily search their contents.

The Internet Archive is so beloved that, when last week the United States Court of Appeals for the Second Circuit upheld the ruling against the site in its legal battle with the publisher Hachette—resulting in the removal of more than 500,000 books and the death knell of the site as we know it—my friend messaged me right away. I responded, without hyperbole, “genuinely devastated”. I’m not alone. For me, the ruling against the Internet Archive is more than just an inconvenience, just as its value is more than just a convenient resource. It is a vast repository of art, culture, truth—one of the last vestiges of the lost dream of the early internet, when we believed in, or desperately hoped for, the web’s democratic capacity. The Archive is a literal manifestation of the concept of equal access to information; the universal right to knowledge made (cyber-)flesh.

The mechanism the Internet Archive uses to lend books is simple. Called a “controlled digital lending model,” or CDL, it means that books are scanned in and available to be borrowed by one person at a time. In the suit, the Hachette Book Group—along with Penguin Random House, HarperCollins, and Wiley—alleged that the Archive violated copyright and fair usage. Crucially, the publishers took issue with both the Archive and the concept of CDL itself, calling them tantamount to “willful digital piracy on an industrial scale”.

This has worrying implications for other libraries, for whom CDL is a common practice. For years, libraries have said that ebooks are prohibitively and unfairly expensive—around three times the price of a hardback equivalent. On top of this, they are forced by publishers to repurchase ebooks after 26 checkouts or one-to-two years; when the demand for digital versions is constantly increasing, this means that libraries have to dedicate larger and larger proportions of their budgets simply to continue to offer them. Often, this money goes to private third-party companies that lease the ebooks from their publishers and lend them, at significant markup, to the libraries. At a time when libraries face an existential threat to their existence—incessant funding cuts, mounting expenses—the publishers’ war on CDL seems not only unfair but cruel. It should be staggering, unbelievable; instead, it’s a familiar tale of capitalist rapacity from an industry grown bloated and gouty on its own greed.

There will surely be some who would argue that the Archive is not necessary, that rather than borrowing CDL materials from online libraries, you should source ebooks from local lenders instead. This is a facile argument: often, libraries simply don’t have the books you would like for the economic reasons laid out above; if they aren’t bestsellers, what is the point of buying them? Indeed, I have often asked local libraries to purchase digital copies of books I would like, only to be told, after months of waiting, that it isn’t possible.

This argument also doesn’t account for people who live outside countries where these books have been published, or for those who live in areas where these books have been banned from conventional libraries. In the US, for instance, 2023 saw 4,349 book bans across 23 states and 52 public school districts. According to Pen America, the majority of books removed “talk about LGBTQ+ identities, that includes characters of colour, that talk about race and racism, that include depictions of sexual experiences in the broadest interpretation of that understanding”; for good measure, Florida schools have also banned some of the classics, including For Whom the Bell Tolls and Anna Karenina.

Speaking on the initial ruling, Lawrence Lessig, Professor of Law at Harvard Law, stated: “culture needs more than commercial publishing. If the business model of commercial publishing controlled our access to our past, then much of who we were, and much of how we learned to be better, would simply disappear.” I can’t help but agree. The case against the Internet Archive is not just a story about the ruination of an online library, but a grander narrative of our times: how money facilitates the transference of knowledge away from the public and towards the few.