A little post on why publishing statistics are so confusing
Hey y'all, it's Kristen McLean, lead industry analyst from NPD BookScan. I thought I would chime in with some numbers here, since that statistic from the DOJ is super-misleading, and I'm not sure where it originally came from, since we did not provide it directly.
It is possible it came from our data, and was provided by one of the publisher parties, but based on the 58,000 figure, it's not obvious what exactly it includes in terms of "publisher frontlist". 58,000 titles is way too small a number for "all frontlist books published in a year by every publisher"--that's more like 487,000 frontlist titles--so it's clear it's a slice but I'm not sure HOW it was sliced.
NPD BookScan (BookScan is owned by The NPD Group, not Nielsen, BTW), collects data on print book sales from 16,000 retail locations, including Amazon print book sales. Included in those numbers are any print book sales from self-publishing platforms where the author has opted for extended distribution and a print book was sold by Amazon or another retailer. So that 487K "new book" figure is all frontlist books in our data showing at least 1 unit sale over the last 52 weeks coming from publishers of all sizes, including individuals.
Lots of press outlets have been calling about it today, so I did a little digging to see if I could reverse-engineer the citation, and am happy to share our numbers here for clarity.
Because this is clearly a slice, and most likely provided by one of the parties to the suit, I decided to limit my data to the frontlist sales for the top 10 publishers by unit volume in the U.S. Trade market. My ISBN list is a little smaller than the one quoted in the DOJ, but the principals will be the same.
The data below includes frontlist titles from Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not pubishers they distribute.
Here is what I found. Collectively, 45,571 unique ISBNs appear for these publishers in our frontlist sales data for the last 52 weeks (thru week ending 8-24-2022).
In this dataset:
>>>0.4% or 163 books sold 100,000 copies or more
>>>0.7% or 320 books sold between 50,000-99,999 copies
>>>2.2% or 1,015 books sold between 20,000-49,999 copies
>>>3.4% or 1,572 books sold between 10,000-19,999 copies
>>>5.5% or 2,518 books sold between 5,000-9,999 copies
>>>21.6% or 9,863 books sold between 1,000-4,999 copies
>>>51.4% or 23,419 sold between 12-999 copies
>>>14.7% or 6,701 books sold under 12 copies
So, only about 15% of all of those publisher-produced frontlist books sold less than 12 copies. That's not nothing, but nowhere as janky as what has been reported.
BUT, I think the real story is that roughly 66% of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks. (Those last two points combined)
And less than 2% sold more than 50,000 copies. (The top two points)
Now data is a funny thing. It can be sliced and diced to create different types of views. For instance we could run the same analysis on ALL of those 487K new books published in the last 52 weeks, which includes many small press and independetly published titles, and we would find that about 98% of them sold less that 5,000 copies in the "trade bookstore market" that NPD BookScan covers. (I know this IS a true statistic because that data was produced by us for The New York Times.)
But that data does not include direct sales from publishers. It does not include sales by authors at events, or through their websites. It does not include eBook sales which we track in a separate tool, and it doesn't include any of the amazing reading going on through platforms like Substack, Wattpad, Webtoons, Kindle Direct, or library lending platforms like OverDrive or Hoopla.
BUT, it does represent the general reality of the ECONOMICS of the publishing market. In general, most of the revenue that keeps publishers in business comes from the very narrow band of publishing successes in the top 8-10% of new books, along with the 70% of overall sales that come from BACKLIST books in the current market. (Backlist books have gained about 4% in share from frontlist books since the pandemic began, but that is a whole other story.)
The long and short of it is publishing is very much a gambler's game, and I think that has been clear from the testimony in the DOJ case. It is true that most people in publishing up to and including the CEOs cannot tell you for sure what books are going to make their year. The big advantage that publisher consolidation has brought to the top of the market is deeper pockets and more resources to roll those dice. More money to get a hot project. More money to influence outcomes through marketing, more access to sales and distribution mechanisms, and easier access to the gatekeepers who decide what books make it onto retailers' shelves. And better ability to distribute risk across a bigger list of gambles.
It is largely a numbers game and I'm not just saying that because I'm a numbers gal. It's a tough business.
Hope this is helpful.
If anyone has questions, they are welcome to reach out to me directly at email@example.com.
My reading of this statistic—a version of which surfaces from time to time, as you note in the opening—is that it's counting (1) new titles requiring an ISBN, which can include re-issues or new formats or POD editions, and (2) what those sales were during the year of release or first year of sales. It's also including many, many different types of publishers which may or may not sell in the bookstore market. That latter part, I think, is the biggest factor in why a book may look like a poor seller when it isn't.
On the other hand, traditionally published authors tend to highly value bookstore sales, as we saw recently with the B&N policy change. So in that regard, I think it's a helpful reminder that print bookstore sales may not drive a book's success and a lot depends on the publisher and category of book.
I had no idea this would go viral - and your explanations make a lot of sense. Thanks!
Hey peeps, I just recorded an episode inspired by this thread for the BBC podcast More or Less, because of all of the great engagement in the community here. Here it is for your listening pleasure: https://www.bbc.co.uk/programmes/p0d8nb1w
"And it’s true that publishers often have no idea what will sell. It’s a throw-against-the-wall-and-see-what-sticks industry. Are there are lot of problems with it? A lot of things that could be fixed? Ways that publishers could better market the backlist or frontlist? Yes! But it’s not quite as dire as some of these statistics suggest."
This is my issue though. There -are- ways to track this and as a tech-nerd, it befuddles me that I'm not picking up on omni-channel dashboards being used in publishing. There are just no APIs for some of that—even foot-traffic data—when there absolutely could be.
If it's not quite as dire, that data needs to exist and transparently. Right?
Great reminder that facts, or assertions, need context!
A good read, both enlightening, and a bit disheartening.
Lincoln, Kristen and Jane - some comments from the music side of the fence that will resonate.
The challenge this blog highlights is counting units, logging street dates and defining the time period. All are similar the challenges we face in music.
First, when we talk about how many songs are on the digital shelf, the answer has to be 'de-duplicated' - an ISRC for a single release of a song might be different from the ISRC on the album. They need to be rolled up into one. Second, the debate over frontline and catalogue definitions has been going on in music since 2017 (see my work here: https://tinyurl.com/3unbh5ba). We've seen click-bait headlines like 'is old music killing new music' whereas the truth is that music between 18 and 36 months old (that's old but not *that* old) is seeing a surge in demand. Finally, claims like this need to be specific of the time period, is it all time or in the most recent calendar year. If you were to ask how many songs hadn't received a click last year, then lots of them would have received a click in the years before.
The most recent example of this type of story in music is the excellent work of MBW which showed that 80% (78.4%) of artists on Spotify today – around 6.3 million of them – have a monthly audience on the platform smaller than 50 people. This is correct, but is it fair? Should hobbyists be lumped in with serious artists? Should it be a monthly stat, or all time? Should dead artists (and those who are no longer active) be grouped with those who are alive and kicking?
Anyway, just to sign off that I feel your pain and trust me - the longer the long tail, the more you'll see these headlines.
Most of a publisher’s profits come from the backlist titles that typically account for around 60% of gross sales in any given year. Way back in 1981 I looked at the previous five years of sales for the top 20 titles sold at a large publisher where I worked, and 16 were the same backlist titles for each of the five years, then four were frontlist that rarely stayed on the list past their launch year. In many cases, the returns in years two and three turned those former stars into money losers.
I love how you re-de-mystified a popular topic. There’s a lot of negative press about traditional publishing, and for good reason, but I have a hard time believing things are as dire for writers as they are presented. I love a good contextual, hopeful explanation.
I mean, with outlets like substack and the ever-growing digital space, creatives really have a ton of opportunity. It’s hard to see in all the noise, though. Maybe it’s just easier to believe that things are getting worse instead of both getting better and getting worse?
They don't sell only a dozen copies? So, they sell far more? If your title is any hint at your writing skill, no wonder your books don't sell.
This is a stellar article that clearly you put a ton of time into writing. Thanks for sharing. If I had a gripe with the whole thing, it's that the publishers are the ones trying to portray the woes of the industry to further merge and reduce competition. Worse still, many authors accept the state of publishing and have taken to boasting that a lack of "commercial appeal" indicates quality.
I am always curious about sales data for individual books that I love. I assume this is proprietary, but is there a public place to search for this data?
Thanks so much Lincoln for writing this! Reminds me of cash register data on food brands quite a lot...
An interesting read, though I think you may be missing the mark on this particular topic. This problem is well understood in the realm of mathematics.
Many markets follow what is called Zipf's law. Essentially the distribution of books sales is expected to follow a power law, and therefore it is the case that the vast majority of books sell overwhelmingly few copies. I've added a paper below that references evidence for book sales following this law .
"I’m pretty sure publishers would go out of business if 50% of their books sold less than 12 copies!"
This statement couldn't be further from the truth. Publishers will make most of their money on relatively few books at the very far end of the distribution. Think about it like this; Sufficiently successful books will offset the cost of publishing all books. So as a publisher grows, they can publish more and more books, increasing the likelihood of publishing a bestseller, while publishing a lot of books that will sell hardly any copies. (of course some in between as well. But the point is that very few will make most of the money, some will make a bit of money, and by far most will make little to no money)
So regardless of the technical details about how we classify "books" "published" "sale", we can think of the problem more generally. Most forms of modern media conform to this law. Film, Music, etc.
Hi Tom, I don’t know for sure but have been told this by people I trust in the industry. I think this is also the driving force behind consolidation. The acquiring company can see decades of reliable and steady sales data for Curious George or Ansel Adams and this makes forecasting and running a business with large fixed overheads much easier. Plus the inevitable reduction in staffing this allows as those sales require minimal expense or effort.