Hey, there's a really cool visualization at the bottom of this page. You don't have to read the text, but it will help you understand the visualization.

Last week I came across a copy of Carl Darling Buck's Dictionary of Selected Synonyms in the Principal Indo-European Languages, which quickly (albeit narrowly) displaced The Remarkable Millard Fillmore as my current favorite book. This 1500+ page philological monument is an interlingual thesaurus of over 1,000 concepts such as "world," "sun," "body," &c., and provides their glosses in 30-some Indo-European languages. Buck then details the historical development of the words in each of these languages, tracing their form and semantics back to an Indo-European root, insofar as this is possible.

I realize there are some questions many of my readers have right now, namely "What's an Indo-European language?" "Why would a book like this be so awesome?" and "What's with the Doughboy?" So I'll provide six pieces of information that will explain all of that. I realize also a number of my readers know far more about these things than I do, in which case feel free to add or correct.

First. Humans have been speaking languages for thousands and thousands and thousands of years.

Second. Over the millenia human language has changed a lot. Languages change from one generation to the next; the English you speak isn't the same as the English your parents speak, though this difference may be very subtle. Over a number of generations the difference gets to be noticeable and drastic. Imagine your great-great-great-great-...-great-grandfather was Chaucer. The difference between your English and his is dramatic.

Third. Language change on the scale of centuries results not only in different variations in one language (e.g., your English versus grampa Chaucer's), but entirely new languages. For example, a form of Latin after a millenium or so ceased to be Latin, and was actually an older form of the modern Romance languages (French, Italian, Spanish, &c.). The Romance languages are all "daughter" languages of Latin; they are "genetically related."

Fourth. You can picture now that even further back in time, Latin wasn't Latin, but something more primitive.1 It so happens that Latin is part of the Indo-European language family, which includes the vast majority of languages spoken in Europe and some in Asia. This language family includes a ton of languages you've heard of like Russian, Welsh, English, Greek, Albanian, Irish, Bulgarian, Swedish, Sanskrit, and Farsi; as well as a ton of languages you might not have heard of like Manx, Umbrian, Tocharian, Avestan, Mazanderani, and Shumashti. It does NOT include some languages spoken in Europe, notably Hungarian, Finnish, Estonian (which are all a part of the Finno-Ugric language family) and Basque (which is essentially a linguistic orphan). Linguists are able to reconstruct from these daughter languages a Proto Indo-European (PIE) language that was spoken some 6-8 thousand years ago.

Fifth. Over the thousands of years of language change, apart from the obvious occurrence that the forms of words change (e.g., Latin mundus 'world' corresponds to French monde), the semantics of the words themselves change. This actually happens all the time over just decades in a single language, and makes it all the more fun to read old books (for example in Sherlock Holmes, where poor Mrs. Hudson is incessantly "knocked up"). Naturally, then, this happens as languages develop into new languages. An example of this is the word dough. This word is ultimately derived from the PIE root *dʰeigʰ- 'to knead.' (asterisk means it is an hypothesized form, not attested). But this root developed in very different directions in other Indo-European languages. In Sanskrit the word देह deha actually means 'body.' Hence we can understand the essence of the Pillsbury Doughboy not through pie, but through PIE.2

Sixth. Now you should be equipped to understand, and hopefully excited about, C.D. Buck's Dictionary. Buck elucidates thousands of obscured relationships between words in Indo-European languages, providing beautiful sparkling etymological nuggets on nearly every page.

Why am I telling you about it.

My one qualm with Buck's book is the lack of illustrations. To make up for this I determined to create a way to visualize this concept dictionary.

What I came up with is a way to graph each concept by clustering languages based on shared etymologies for this concept. For example, the vast majority of IE languages use the inherited PIE word for sun: all of these languages constitute one cluster in the graph for the concept 'sun.' Irish and Old Irish, however, derive their word for sun from the PIE root for 'shine,' so these constitute a second cluster.

Note that the clusters in this graph do NOT necessarily correspond with direct genetic relationship between languages. For a detailed genetic tree of Indo-European languages, see this. In the graph, however, even two distant IE languages will be clustered together if they derive their word for a particular concept from the same PIE root. Notice how Romanian, a Romance language, doesn't cluster with Romance languages in the 'world' graph. This will happen for a variety of reasons. In this case it is due to language contact with Slavic and a phenomenon known as calquing. That Sanskrit and Romanian share a common PIE root for this concept, then, is coincidence, but interesting nonetheless.

C.D. Buck consistently provides glosses in 31 languages throughout the book, which I've added as labeled nodes in the graphs. I use Buck's abbreviations, which are in some cases antiquated. Importantly, Boh(emian) is the old name for Czech (Buck amusingly notes that he would have used the name 'Czech,' but it didn't lend itself to aesthetically pleasing abbreviation), Lett(ish) is Latvian, and (Rum)anian is how 'Romanian' was spelled before the Romanians cared to advertise their proud Roman heritage.

It is not uncommon for languages to have multiple words for one concept. In such cases I've added unlabeled nodes linked to the appropriate language.

To view a given language's word for a concept, hover your mouse over a node and the word will appear in the center of the chart. A tooltip will also appear after a second with the full name of a language, in case you've forgotten it. I've copied Buck's glosses exactly, including non-English symbols. This means there are some thorns in there (literally). If you want to know the Greek word for something and you don't read Greek, you're going to have to figure out how to read Greek. Also in Slavic there are things called yers, which look like little b's. Conveniently, you can pronounce them as the "uh" you'd pronounce upon encountering them and not knowing how to pronounce them.

You can consider the following a demonstration. There's still a lot I want to do with this visualization, such as be able to browse Indo-European roots themselves and see how they developed in daughter languages. More importantly, I'm lacking data. I set up a database for this project, so it is easily expandable. I don't have time or a secretary, though, so I've only managed to enter three concepts. If anyone wants to help out, please contact me!

1 I don't mean "primitive" as "unsophisticated." Languages now are just as complex as languages thousands of years ago. They haven't grown more sophisticated or more simple.^

2 I discovered another brand-name etymology when I was in high school. Ford's Taurus, built as a budget family vehicle, is so named because taurus is Latin for 'bull', thus presenting the car as "a-Ford-a-bull," i.e. affordable! So far this etymology is just conjecture.^