Friday, July 17, 2009

A Hard-Scrabble Upbringing

Here's an interesting question (and perhaps a unique definition of "interesting"): how many words are in the English language? And how many of them do you think you know?

You may recall that last month the Global Language Monitor announced that the one-millionth word was added to the English language (and, it seems, another 411 have been added since then). By the way, that "millionth word" was Web 2.0. Great. So now even words have revision numbers. Anyway, according to the site, more than 14 words are added to English every day. And every one of them is a gem, of course.

Naturally, many professional linguists had no words to describe how outraged they were by such a seemingly definitive quantification of English words. But, hey, it's all a bit of fun, and it's best to take such estimates with a grain of salt.

After all, the biggest question facing "word spotters" is, "well, what is a word anyway?" That is, take the word...take. That's obviously a word. But what about all the spin-offs of it, such as taking, taker, taken, or even took, for that matter? And take can be either a verb ("to get into one's possession") or a noun ("the act of taking"). There is a definition of take unique to filmmaking, and one to crime. And so on. Are all of these different words? Or the same word? Or what? Perhaps we could assign partial word values to these instances--so that, say, take itself could be considered 1 word, but taking is 0.5 word, taken another 0.5, and maybe took, because if its spelling, could be 0.75 word. Or something. Take as referring to a shot in filmmaking could be worth maybe 0.3759665 word. Or...

Okay, I'm better now.

Then of course there are written numbers...one, two, three... Technically, the list of nunbers is infinite, so if someone wanted to write out every number and count each as a different word, that obviously means that English has an infinite number of words right there.

Then there are obscure scientific terms, plants, animals...heck, there are believed to be between five and eight million species of beetles alone, words all.

So there are all sorts of problems quantifying English.

It may be somewhat easier to quantify how many words a single person knows, but, after all, the same basic problems apply.

I ran across a way of coming up with a quick (well, kind of quick) estimate:
Professor David Crystal, known chiefly for his research in English language studies and author of around 100 books on the subject...suggests taking a sample of about 20 or 30 pages from a medium-sized dictionary, one which contains about 100,000 entries or 1,000 to 1,500 pages.

Tick off the ones you know and count them. Then multiply that by the number of pages and you will discover how many words you know.
Okay, I'm game. So I grabbed my large unabridged Random House Dictionary (Second Edition, published in 1987) and decided to, as the name Random House suggests, randomly pick one
page per letter and count how many words on that page I knew. I say "random" but not entirely; I tried to avoid pages that contained words all having the same prefix (like all the re-s, un-s or foot-s, etc.). I also avoided proper names, towns, countries, and pretty much anything capitalized. And, no, I didn't read the defintion of a word I previously did not know and then claimed that I knew it. Cynical are we?

Anyway, here's how the "raw data" turned out:
A (aggressive-ago): 29
B (bleeder-blind): 25
C (cognize-coinage): 31
D (defliate-degranulation): 43
E (esotharnax-essence d'orient): 32
F (ferule-fetus): 33
G (gaby-gain): 25
H (Hippo Regius-histocompatibility): 21
I (irregular galaxy-Isaac): 52
J (jet-jib): 28
K (knot-Kodály): 27
L (leathery-lederhosen): 23
M (minus-mirror writing): 22
N (Nesselrode-Neufchatel): 23
O (OPM-opsonin): 18
P (percolation-perfective): 30
Q (quadrillion-qualify): 28
R (rappel-raspy): 39
S (spokesman-spoonbill): 22
T (temporary-tenderable): 27
U (ugly-ulterior): 20
V (vespiary-vexillate): 31
W (War Between the States-warlock): 35
X (xenograft-xylene): 14
Y (Yarkand-yearling): 23
Z (Zola--zoon): 21
There were 26 pages, which works out to an average words-I-know-per-page of 27.77. The dictionary has a total of 2,214 pages. Which means that I know 61,481.08 words. That's probably far more precise than this little experiment warrants, so I'm prepared to say that I know somewhere between 60,000–65,000 words, not that use them all on a regular basis (despite what some people may think). And this dictionary was published in 1987, before the Internet and all the technological terms I use and abuse on a regular basis existed. Looking around my office, I see a variety of things that would not have been in a 1987 dictionary: WiFi, cellphone (although "cellular phone" is but not the shortened one-word term), iPhone, iPod, podcast, blog, thumb drive, USB, FireWire, cable-modem, optical mouse, MP3, Web browser, ...the list goes on.

It's not surprising that words are constantly being added to English. Unlike a lot of languages, English is a highly dynamic, democratic language in the sense that there is no central authority overseeing it and placing strict controls over what words can be added to the lexicon (like, say, French). As long as a word can be understood by some number of other people, it's fair game for the vocabulary. Sure, I complain about this as much as anyone, but that's the English language's great strength. One man alone—William Shakespeare—is responsible for adding more than 1,000 words to the English language (well, sort of; his works are often the first written instances of many words; surely his contemporaries knew at least some of them or audiences would have had no idea what his characters were saying!). Some of the words attributed to Shakespeare include "arch-villain," "bedazzle," "cheap" (as in vulgar or flimsy), "dauntless," "embrace" (as a noun), "fashionable," "go-between," "honey-tongued," "inauspicious," "lustrous," "nimble-footed," "outbreak," "pander," "sanctimonious," "time-honored," "unearthly," "vulnerable," and "well-bred.

English words can come from anywhere. It's all about communication. As long as we understand each other, anything goes. However, I draw the line at pwn and text as a verb.

Still, I find that any discussion of lexicography can't ignore the episode of Black Adder the Third, featuring Edmund (Rowan Atkinson) and Dr. Samuel Johnson (who compiled the first dictionary and whose house I visited a couple years ago) and this classic exchange:
Dr. Johnson: Here it is, sir: the very cornerstone of English scholarship. This book, sir, contains every word in our beloved language.
...
Edmund: Every single one, sir?

Dr. Johnson: (confidently) Every single word, sir!

Edmund: (to Prince) Oh, well, in that case, sir, I hope you will not object if I also offer the Doctor my most enthusiastic contrafribularities.

Dr. Johnson: What?

Edmund: 'Contrafribularites', sir? It is a common word down our way...

Dr. Johnson: Damn! (writes in the book)

Edmund: Oh, I'm sorry, sir. I'm anispeptic, frasmotic, even compunctuous to have caused you such pericombobulation.

Dr. Johnson: What? What? WHAT?

Prince George: What are you on about, Blackadder? This is all beginning to sound a bit like dago talk to me.

Edmund: I'm sorry, sir. I merely wished to congratulate the Doctor on not having left out a single word. (J sneers) Shall I fetch the tea, Your Highness?

Prince George: Yes, yes! And get that damned fire up here, will you?

Edmund: Certainly, sir. I shall return interfrastically. (exits) (J writes some more)
But no matter how we quantify our language, we should always make sure that in our word counts we never forget to include sausage.

No comments: