This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Skip to the content

Share this

Free weekly newswire

Sign up to receive all our latest news direct to your inbox.

Physics on film

100 Second Science Your scientific questions answered simply by specialists in less than 100 seconds.

Watch now

Bright Recruits

At all stages of your career – whether you're an undergraduate, graduate, researcher or industry professional – can help find the job for you.

Find your perfect job

Physics connect

Are you looking for a supplier? Physics Connect lists thousands of scientific companies, businesses, non-profit organizations, institutions and experts worldwide.

Start your search today


Taking an author’s ‘literary fingerprint’

literary fingerprint.jpg
Billy S A king of infinite space

By James Dacey

Imagine this: a much-celebrated author locks himself away to begin work on his masterpiece, a novel called The Meta Book that will comprise an infinite number of words all strung together in the writer’s unique literary style. While this may sound like the plotline to a short-story by one of the great magical realist authors of Latin America, it is actually the idea of a trio of physicists in Sweden.

Sebastian Bernhardsson and his colleagues at Umeå University are interested in the unique “literary fingerprint” left by famous authors. They conceptualize a writer’s use of language as a complex system in the same way that scientists model the climate, the economy or ant colonies.

By feeding an author’s entire oeuvre into their calculations, they find that each writer creates a unique curve on a graph representing the number of different words used as a function of the total number of words. What’s more, this signature curve can be detected in every single work of a particular author regardless of what they are writing about.

Publishing their findings in New Journal of Physics the authors create curves for the works of Thomas Hardy, DH Lawrence and Herman Melville. “It is like everything an author can think of writing is processed by a mental pipeline which imposes a unique fingerprint on an authors’ infinite meta-book,” says Bernhardsson. I think, what he means by this is that (statistically speaking) there is a common thread running through everything these authors wrote — as if they were plucking extracts from their infinite corpus.

Now, the literary purists out there may be reading this and seething at yet another example of uncouth physicists trying to impose rigid mathematical frameworks onto works of unquantifiable beauty, or of “unweaving the rainbow” as Keats famously accused Newton. If anything, however, the results of this research reveal the opposite. For 75 years, language analysts have assumed that all literature, regardless of author, follows the same statistical pattern when viewed as a whole. This was based on the law proposed by American linguist George Kingsley Zipf stating that the frequency of a word is inversely proportional to its occurrence.

In this new view of fiction, however, each author defines their own unique law based on non-trivial mathematics. “It shows that, even statistically speaking, our personality is not drowned by the general rules, and structure of the language itself,” says Bernhardsson.

The researchers intend to develop their work by testing their meta book concept for more authors and languages other than English. So who knows — maybe the magical literary worlds of Borges and Márquez will be next in line to have their curves exposed.

This entry was posted in General. Bookmark the permalink.
View all posts by this author  | View this author's profile

Comments are closed.


  • Comments should be relevant to the article and not be used to promote your own work, products or services.
  • Please keep your comments brief (we recommend a maximum of 250 words).
  • We reserve the right to remove excessively long, inappropriate or offensive entries.

Show/hide formatting guidelines

Tag Description Example Output
<a> Hyperlink <a href="">google</a> google
<abbr> Abbreviation <abbr title="World Health Organisation" >WHO</abbr> WHO
<acronym> Acronym <acronym title="as soon as possible">ASAP</acronym> ASAP
<b> Bold <b>Some text</b> Some text
<blockquote> Quoted from another source <blockquote cite="">IOP</blockquote>
<cite> Cite <cite>Diagram 1</cite> Diagram 1
<del> Deleted text From this line<del datetime="2012-12-17"> this text was deleted</del> From this line this text was deleted
<em> Emphasized text In this line<em> this text was emphasised</em> In this line this text was emphasised
<i> Italic <i>Some text</i> Some text
<q> Quotation WWF goal is to build a future <q cite="">
where people live in harmony with nature and animals</q>
WWF goal is to build a future
where people live in harmony with nature and animals
<strike> Strike text <strike>Some text</strike> Some text
<strong> Stronger emphasis of text <strong>Some text</strong> Some text