No, Most Books Don't Sell Only a Dozen Copies
A little post on why publishing statistics are so confusing
One thing the PRH/SS merger trial revealed is that publishing has a lot of problems. This is very true! At the same time, many of the problems seem to have mutated into unbelievable chimeras as they made their way around the discourse. Today, for example, much of the literary internet was debating a claim that 50% of books published sell fewer than 12 books.
This claim took off with the usual suspects—conservative pundits claiming publishing is too “woke” and self-publishing evangelicals saying every author would make a fortune if they ditched traditional publishing—but the publishing professionals I know said this claim is very fishy. (I’m pretty sure publishers would go out of business if 50% of their books sold less than 12 copies!) So this statistic isn’t true. Or at least it isn’t true in the way you might think.
But publishing statistics are often not what you think. This extreme 12 copies claim joins a couple others that have gone around the internet recently: “98 percent of books sell fewer than 5,000 copies.” “90 percent sell fewer than 2,000 copies” “Most books sell fewer than 99 copies.” Etc.
Are all of these true? None of them? Part of the problem with evaluating claims of “most published books sell [X] copies” is that it—[apologies for the Derrida voice]—it all depends on what you mean by “book,” “published,” and “sell.” No, I’m not playing postmodern games here. It really is confusing.
What’s “a book”?
The Platonic ideal of a book might be a collection of text printed on a few hundred paper pages. But the term encompasses much more than that, including books that have almost no text at all (for example the “adult coloring book” craze of the 2010s). Publishers publish novels and memoirs as well as cookbooks, puzzle books, Mad Libs, etc. But it’s even more confusing than this when it comes to those statistics.
Last year, Orbit published my debut novel The Body Scout. I wrote one novel, so published one book. Right? Not exactly. From a sales tracking perspective, books are published in multiple formats, each with different ISBNs. I wrote one novel, but from a title count POV I actually published 4 books: hardcover, paperback, ebook, and audiobook. Other books have even more formats (mass market version, movie tie-in editions, etc.) and because they all have different ISBNs, they all have different sales figures.
When it comes to classics that are in the public domain, like Pride and Prejudice or Shakespeare, there can be literally hundreds of editions in existence (put out by various publishers) each of which could be counted separately.
What’s “published”?
A published book can refer to a newly written and released book—aka frontlist, or a novel published for the first time in 2022—or it can refer to anything a publisher puts out in a given year, including a reissue of an existing book. But sometimes “published” means any book that exists in any format available for purchase. A hundred-year-old novel that no longer can be found bookstore shelves yet sits in cardboard box in the back of a warehouse somewhere is “a published book.” Some books are never published in print at all and exist only as ebooks and/or audiobooks. And many books exist in “print on demand” form in which a physical copy doesn’t exist until someone purchases it. In the old days, books would go out of print when people stopped buying them. But in the modern digital age books can exist “in print” for forever.
It’s also worth pointing out here that publishers range from tiny micropresses run as a hobby in someone’s garage to multi-billion dollar companies. If you’re counting books by ISBNs, this would also include many self-published books.
What’s “a sale”?
When people reference book sales, they’re typically talking using NPD Group’s BookScan numbers. Think of BookScan as the book industry version of Nielsen TV ratings. Briefly, BookScan is a “point of sale” tracking system that counts the number of print copies sold at participating retail locations. BookScan allegedly tracks about 75% of retail sales including lots of indie bookstores as well as Barnes & Noble, Amazon, and Walmart.
It’s a fine tool for what it is, but from a data perspective it’s only partial. I already mentioned the issue with multiple formats above. BookScan also only tracks print so doesn’t include audiobooks or ebooks. (There is a Nielsen ebook estimator, but it’s rarely included in these statistics. And ebook sales are tricky since prices fluctuate wildly.) Even restricting ourselves to print books, BookScan misses plenty. Sales to libraries, for example, can be a significant portion of a book’s sales. Many small press and self-published authors might sell directly via an author’s website or in-person events. And so on.
Additionally, there are dramatic differences between 1) lifetime sales, 2) sales in the first 12 months after publication, and 3) sales in any random calendar year. Most books sell most of their copies in the first year or two after publication. Some books are perennial sellers and others might break out later—e.g., when there is a TV or film adaptation—but most sell the bulk of their copies early. Any statistic based on 3) is going to give you a completely inaccurate impression of what a book has sold. Imagine a 1998 novel that sold 6,000 its first year and 10,000 copies to date. It might sell only 12 copies in 2022, decades later, but that hardly means it was a failure.
Okay, so what does all the above mean? Mostly it means these statistics are completely meaningless unless we know what’s being included. Are you counting lifetime sales or one year’s sales? One year’s sales for frontlist titles or backlist titles? Only Big 5 books or anything with an ISBN?
Take the statistic that most published books only sell 99 copies. This seems shocking on its face. But if you dig into it, you’ll notice it was counting one year’s sales of all books that were in BookScan’s system. That’s quite different statistic than saying most books don’t sell 100 copies in total! A book could easily be a bestseller in, say, 1960 and sell only a trickle of copies today. In the same way, most old movies and albums aren’t frequently watched/listened to in 2022. It’s only a small percentage of past works that remain popular. Most backlist books selling fewer than 99 copies doesn’t tell you anything about how much newly released books sell.
(If you’re wondering—as people did on Twitter—why publishers keep books in print that don’t sell, remember that a book being in print doesn’t mean a publisher is actively spending lots of money on the title. It doesn’t break the bank to keep one box in the corner of a warehouse. And as I noted above a book can be “in print” these days and exist only in a digital form or awaiting “print on demand.”)
In terms of the dozen copies statistic, I can’t evaluate it because it is unclear what it’s referring to. Fifty-eight thousand books is more books than PRH publishes in a given year, but far less than their entire backlist. Is 58k all new books published with an ISBN, including self-published books? Is it something else? I really don’t know and none of the publishing professionals I follow seem to know either. (Editing to add: Jane Friedman, who posted this number originally on Instagram, noted there was no source given in testimony. Friedman gives her own guess in the comments.)
In my experience, and with the data I’ve seen, most traditionally published novels that you see on bookstore shelves or reviewed in newspapers sell several hundred to a few thousand copies across formats. Many sell much more of course. I’ve seen some flops that sold only a couple hundred. And of course not all traditionally published novels appear in bookstores or reviewed in newspapers. Is it possible someone has published a Big 5 novel that sold only 12 copies over its lifetime? I suppose. But I don’t think it’s 5% much less 50%!
Then again, I’m talking about novels published by established presses that appear in bookstores. How many copies do self-published poetry chapbooks sell? Print on demand public domain classics? Backlist sudoku puzzle books? I couldn’t begin to guess.
One thing that’s true is publishing works on a blockbuster model. Most books sell relatively few copies and a handful sell millions. The statistics about the vast majority of traditionally published books selling fewer than five thousand are likely true. It’s the bestsellers that keep the industry afloat, financially. And it’s true that publishers often have no idea what will sell. It’s a throw-against-the-wall-and-see-what-sticks industry. Are there are lot of problems with it? A lot of things that could be fixed? Ways that publishers could better market the backlist or frontlist? Yes! But it’s not quite as dire as some of these statistics suggest.
UPDATE: Kristen McLean from BookScan very nicely provided data in the comments so please look below from some illuminating if depressing data!
As always, if you like this newsletter, please consider subscribing or checking out my recently released science fiction novel The Body Scout, which The New York Times called “Timeless and original…a wild ride, sad and funny, surreal and intelligent” and Boing Boing declared “a modern cyberpunk masterpiece.”
Hey y'all, it's Kristen McLean, lead industry analyst from NPD BookScan. I thought I would chime in with some numbers here, since that statistic from the DOJ is super-misleading, and I'm not sure where it originally came from, since we did not provide it directly.
It is possible it came from our data, and was provided by one of the publisher parties, but based on the 58,000 figure, it's not obvious what exactly it includes in terms of "publisher frontlist". 58,000 titles is way too small a number for "all frontlist books published in a year by every publisher"--that's more like 487,000 frontlist titles--so it's clear it's a slice but I'm not sure HOW it was sliced.
NPD BookScan (BookScan is owned by The NPD Group, not Nielsen, BTW), collects data on print book sales from 16,000 retail locations, including Amazon print book sales. Included in those numbers are any print book sales from self-publishing platforms where the author has opted for extended distribution and a print book was sold by Amazon or another retailer. So that 487K "new book" figure is all frontlist books in our data showing at least 1 unit sale over the last 52 weeks coming from publishers of all sizes, including individuals.
Lots of press outlets have been calling about it today, so I did a little digging to see if I could reverse-engineer the citation, and am happy to share our numbers here for clarity.
Because this is clearly a slice, and most likely provided by one of the parties to the suit, I decided to limit my data to the frontlist sales for the top 10 publishers by unit volume in the U.S. Trade market. My ISBN list is a little smaller than the one quoted in the DOJ, but the principals will be the same.
The data below includes frontlist titles from Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not pubishers they distribute.
Here is what I found. Collectively, 45,571 unique ISBNs appear for these publishers in our frontlist sales data for the last 52 weeks (thru week ending 8-24-2022).
In this dataset:
>>>0.4% or 163 books sold 100,000 copies or more
>>>0.7% or 320 books sold between 50,000-99,999 copies
>>>2.2% or 1,015 books sold between 20,000-49,999 copies
>>>3.4% or 1,572 books sold between 10,000-19,999 copies
>>>5.5% or 2,518 books sold between 5,000-9,999 copies
>>>21.6% or 9,863 books sold between 1,000-4,999 copies
>>>51.4% or 23,419 sold between 12-999 copies
>>>14.7% or 6,701 books sold under 12 copies
So, only about 15% of all of those publisher-produced frontlist books sold less than 12 copies. That's not nothing, but nowhere as janky as what has been reported.
BUT, I think the real story is that roughly 66% of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks. (Those last two points combined)
And less than 2% sold more than 50,000 copies. (The top two points)
Now data is a funny thing. It can be sliced and diced to create different types of views. For instance we could run the same analysis on ALL of those 487K new books published in the last 52 weeks, which includes many small press and independetly published titles, and we would find that about 98% of them sold less that 5,000 copies in the "trade bookstore market" that NPD BookScan covers. (I know this IS a true statistic because that data was produced by us for The New York Times.)
But that data does not include direct sales from publishers. It does not include sales by authors at events, or through their websites. It does not include eBook sales which we track in a separate tool, and it doesn't include any of the amazing reading going on through platforms like Substack, Wattpad, Webtoons, Kindle Direct, or library lending platforms like OverDrive or Hoopla.
BUT, it does represent the general reality of the ECONOMICS of the publishing market. In general, most of the revenue that keeps publishers in business comes from the very narrow band of publishing successes in the top 8-10% of new books, along with the 70% of overall sales that come from BACKLIST books in the current market. (Backlist books have gained about 4% in share from frontlist books since the pandemic began, but that is a whole other story.)
The long and short of it is publishing is very much a gambler's game, and I think that has been clear from the testimony in the DOJ case. It is true that most people in publishing up to and including the CEOs cannot tell you for sure what books are going to make their year. The big advantage that publisher consolidation has brought to the top of the market is deeper pockets and more resources to roll those dice. More money to get a hot project. More money to influence outcomes through marketing, more access to sales and distribution mechanisms, and easier access to the gatekeepers who decide what books make it onto retailers' shelves. And better ability to distribute risk across a bigger list of gambles.
It is largely a numbers game and I'm not just saying that because I'm a numbers gal. It's a tough business.
Hope this is helpful.
If anyone has questions, they are welcome to reach out to me directly at kristen.mclean@npd.com.
My reading of this statistic—a version of which surfaces from time to time, as you note in the opening—is that it's counting (1) new titles requiring an ISBN, which can include re-issues or new formats or POD editions, and (2) what those sales were during the year of release or first year of sales. It's also including many, many different types of publishers which may or may not sell in the bookstore market. That latter part, I think, is the biggest factor in why a book may look like a poor seller when it isn't.
On the other hand, traditionally published authors tend to highly value bookstore sales, as we saw recently with the B&N policy change. So in that regard, I think it's a helpful reminder that print bookstore sales may not drive a book's success and a lot depends on the publisher and category of book.