Trending Science: Big Data algorithm promises to find the next best-selling novel
In a new book, ‘The Bestseller Code: Anatomy of a Blockbuster Novel’, two Stanford academics describe how an algorithm they designed is able to predict, with 80 % accuracy, which new novels will become mega-bestsellers.
The publishing industry, like many other cultural industries such as film and television, is based on hits. However, accurately predicting bestsellers has remained an elusive art, with publishers using gut instinct, educated guesses and knowing what has been popular before. Of course, this is not always accurate – some of the biggest selling and most critically acclaimed novels were rejected repeatedly before finding a willing publisher. This includes J.K. Rowling with ‘Harry Potter and the Philosopher’s Stone’, Stephen King with ‘Carrie’ (rejected 30 times in total) and Frank Herbert’s science-fiction masterpiece ‘Dune’ to name just three seminal authors who went on to be wildly successful when they eventually secured a publishing deal.
Now the algorithm – called by its creators the ‘bestseller-ometer’ – may be able to lend a helping hand. It builds upon a movement in the publishing industry that began in the 2000s when digital e-books really took off, namely to complement publishers’ gut instinct with the use of Big Data. The enterprise was conceived at Stanford University in 2008, when PhD student Jodie Archer and Matthew L. Jockers, an associate professor of English (now at the University of Nebraska-Lincoln but helped found the Stanford Literary Lab) joined forces to uncover how computers could analyse and understand books in a way that people are unable to do.
Crunching the data for the perfect novel
After several years of collaboration, they crunched the data on 20 000 ‘The New York Times’ bestselling novels through the processing power of thousands of computers. They taught these computers how to ‘read’ – in essence, to train the computers to determine where sentences begin and end, to identify parts of speech, and to map out plots. They then used so-called machine classification algorithms to isolate the features most common in bestsellers. Now the bestseller-ometer can predict with 80 % accuracy the likelihood of whether a new novel will be a hit or a flop.
So what are the key factors according to the bestseller-ometer that will increase the prospect of a new novel flying off the shelves? Having a young, strong but troubled heroine (think Katniss Everdeen from ‘The Hunger Games’ or Lisbeth Salander from ‘The Girl with the Dragon Tattoo’) as your main protagonist is a good start. Don’t discuss sex too graphically but do emphasise ‘human closeness’. Don’t use too many exclamation marks and don’t overdo it with adjectives and adverbs, use the verb ‘need’ at frequent intervals, and if your protagonist has a pet, make it a dog and not a cat. Don’t be afraid to be colloquial – readers of bestsellers prefer more informal language and phrases such as ‘ugh’ and ‘OK’ are to be encouraged. Finally, book titles also matter – use a simple noun for the title (such as Donna Tartt’s hit ‘The Goldfinch’ or Victoria Hislop’s ‘The Island’).
The algorithm’s number one pick
Archers and Jockers waited anxiously to see which novel of the many thousands would be the bestseller-ometer’s favourite. It turned out to be ‘The Circle’ by Dave Eggers, a 2013 thriller about a young (female) graduate student who goes to work for an immensely powerful technology company that has its own dark ambitions for reshaping the world in line with its privacy-destroying philosophy.
More specifically, the bestseller-ometer liked the book’s female protagonist, the fact that ‘need’ and ‘want’ were her most commonly-used verbs and its focus on three specific themes – technology, jobs and the workplace, and human closeness, the latter being the most prevalent topic across all bestsellers according to the algorithm). Most importantly, ‘The Circle’ did indeed become a bestseller, spending multiple weeks on ‘The New York Times’ bestseller list.
However, Archer is very quick to point out the irony here of the bestseller-ometer’s choice – ‘The Cirlce’ is a dystopian novel that highlights the dangers of Big Data and technology’s ever-increasing intrusion into all aspects of human life.
Although Archer and Jockers have no intention to commercialise their creation, Big Data is likely to make more of a mark on the publishing industry in the near future, with the corresponding fear that a bigger reliance on data could reduce the diversity of narratives as publishers chase profit. ‘The fear is that we can homogenise the market and the answer is no,’ commented Archer. ‘What the bestseller-ometer is trying to do is say, ‘Hey, pick this new author that you might not dare take a risk on with your acquisitions budget’.
published: 2016-10-05