After a busy month I'm finally ready to present the last piece of the PIE project that began with visualizing C. D. Buck's Dictionary of Selected Synonyms. The visualization here is a melding of the previous two visualizations, the first being the visualization of Buck's Dictionary and the second the Indo-European family tree I demonstrated with the Sankey diagram. The idea of this little widget is to be able to see both the synchronic and diachronic nature of concepts in the Indo-European language family. (Read below for a more detailed discussion of what the widget shows.)

Hey, the visualization that I'm talking about used to be right here.

Now it's RIGHT HERE!

What the widget shows

Part I

When the widget loads you will see three groups of circles organized by color, below which are several concepts written in English (viz., sun, world, body, eye, and mother). Each circle represents a language in the Indo-European language family. The abbreviation in the circle tells you which language the circle represents, using the same abbreviations used by Carl Buck in the Dictionary (hover your mouse over the circle to see the full name of the language, if you need to). Hovering your mouse over a circle will reveal the word for the currently selected concept in the given language. Each cluster of circles is coded by color, and these colors correspond to the legend in the upper right. Each item in the legend is the etymological root of all the words in the languages represented by circles in the corresponding cluster.

The upper-left-most cluster in the 'sun' graph is colored as the root PIE *gher- 'to shine' in the legend. This means that the languages in this cluster (which are two: Old Irish and Modern Irish) derive their word for 'sun' from the Proto-Indo-European root *gher-, which means 'to shine.' By hovering your mouse over the circles for Old Irish and Modern Irish, you can see that the words for 'sun' in these languages are grían and grian, respectively. The large central cluster of circles corresponds to the PIE root *sóh₂wl̥, which was the original Proto-Indo-European word for the sun. By hovering over the circles in this cluster you can see how this PIE root developed in the different branches of the Indo-European language family. You'll see that for the most part the modern words for 'sun' approximate the PIE word phonetically (e.g., Spanish sol). The lonely orange circle in a cluster of its own towards the bottom is Albanian, in which the word for 'sun' is derived from the PIE root *ǵʰelh₃-, which originally meant 'yellow.'

In a few cases, such as the Greek helios, you may be incredulous, since they do not seem to have much in common with the PIE root whence they're derived. Linguists are nonetheless able to prove the etymology of words such as Greek helios by noting consistent phonological correspondences between Greek words and words in other Indo-European languages, for example the observation that Greek h when at the beginning of a word (if you're a classical philologist, I'm talking about rough breathing) corresponds to s in most other Indo-European languages (compare Greek hypnos 'sleep,' whence we get English hypnotized, with Latin somnus 'sleep' as in English somnambulant: they too are cognate).

I got a few comments about this part of the widget pertaining to the relationship between clusters and the genetic relationships between languages. To be clear: CLUSTERS DO NOT REPRESENT GENETIC RELATIONSHIPS BETWEEN LANGUAGES. It's true, in many cases related languages cluster together. This is not at all surprising, since, in general, the closer the genetic relationship between languages, the more linguistic material these languages have in common: it is more likely for these two languages to share a word---and therefore etymology---for a given concept. But it is very important to recognize that sharing a common etymology for a particular word does not entail that these languages are closely genetically related!

In fact, the most interesting stories are to be found where a language does not cluster with the rest of its language family. Take, for example, the Romanian word for 'world,' lume. You can see from the graph that all other Romance languages derive their 'world' from the PIE root *mH₂nd- 'to adorn' (aside: yes, this does seem to be a peculiar derivation of 'world': it was probably originally a calque on Greek kosmos; check Buck for more details). But Romanian's lume comes from PIE *leuk- 'light' (compare lumen, the unit of brightness), despite its being a Romance language. You can see from the graph that Sanskrit has a root for 'world' loka-, from the same root as the Romanian. Well, Sanskrit and Romanian are not at all close cousins in the Indo-European family, so it is clear that this etymological connection is mostly a coincidence. What probably occurred here (it's admittedly difficult to prove) is this: Romanian for many years was in close contact with Slavic, who also have a word for 'world' which is homophonous with their word for 'light', *světъ. Due to this language contact, Romanian speakers began using a word derived from their native word for 'light' in the sense of 'world.' This is a process known as calquing, and it's way more interesting than simple genetic inheritance of linguistic material.

Using these clusters it's possible to examine the nature of a concept in the Indo-European language family from a sort of synchronic perspective. Synchrony is technically a cross-section of a language at a particular time; these clusters actually do not show true synchrony since languages like Avestan and Gothic are long extinct. The idea, however, is similar: the clusters show a cross-section of a concept in the different languages of the Indo-European family at their latest attestation.

The complementary notion of synchrony is diachrony, which is the subject of the second half of the widget.

Part II

When you hover your mouse over items in the legend, the text will embolden and your cursor will change to the 'clicky' cursor. This is all to give you the impression that these items are clickable. Indeed, they click. For the purposes of demonstration, go to the 'sun' graph and click on the root PIE *sóh₂wl̥ 'sun.'

Whoooosh! Look at that awesome animation! (Pronounce the "whoosh!" while the animation is going for full effect.)

Now you see a jumble of links and nodes. This was the subject of the second post in this series: it is the family tree of the Indo-European language family. More accurately, it is a select cut of the Indo-European language tree: the Indo-European language family consists of over 400 languages, most of which are languages in the area of India and Iran that you have probably never heard of, like Pangwali and Gilaki. I have a full genetic tree of Indo-European stored in my database (there went a few good December days), but to display all of them at once would make the graph exceptionally difficult to read.

The graph is complicated enough as it is. To mitigate the confusion, I have designed the graph such that when you hover your mouse over a particular language node, the parts of the graph that are irrelevant to the development of this language are greyed out. Move your mouse around the graph and you'll see what I mean.

From left to right, the graph shows the historical development of the Indo-European language family, from Proto-Indo-European to the modern (or at least the latest attested) languages in the family; nodes in the middle show intermediate steps.

All nodes in this tree are one of three colors: red, green, or brown. Red means that the language is extinct (not spoken as the common language of a community, even if it is the language of liturgy, as Latin and Sanskrit are; red also indicates languages that were only ever liturgical languages and never spoken as a community's language, as Old Church Slavonic was). Green means that a language is a living language, still spoken as of 2013. Lastly, brown means that the language is a hypothetical one: we have not found anything written in this language, but using evidence from attested daughter languages, linguists can reconstruct what a given word in this parent language probably sounded like.

Nodes in the tree are connected by green links and pink links. Green links indicate a genetic relationship between languages: over generations, languages evolve organically and gradually into daughter languages (for example, groups of Latin speakers spread out over Europe and over time, isolated from one another, Vulgar Latin evolved in these communities into modern Romance languages, French, Spanish, Italian, Romanian, etc.). In contrast, pink links indicate that one genetically distant language (could be a sister, could be a cousin, could be totally unrelated) has had significant influence on the development of a given language. I have to admit here, these contact links are somewhat discretionary: we are just beginning to realize how ubiquitous language contact has been throughout history, and in actuality there should be many more pink links in the graph. Further, it's difficult to quantify the effect one language has had over another. In the interest of reducing complexity, I've also mostly omitted the influence of English, French, and German on other European languages, since in these cases it would be much easier to count which languages they have NOT influenced. An example of language contact influence is French, which adopted a number of words from Frankish early in its development when the Romance speakers along the Rhine and Rhône came into contact with the Franks (Germanic tribes). Up to a sixth of the modern French vocabulary, including the name "France" itself, are of Frankish origin. (Names of places are often kept even when its native population is driven out: my dear city of Chicago is an example of this, along with many other geonyms in the Midwest. Istanbul is a counterexample to this trend.)

Finally, you will see that under select nodes in the tree you will find a word in the relevant language followed by a gloss (in some cases there are multiple words and glosses). In the graph for PIE *sóh₂wl̥ 'sun', you'll see for example Old High German sunna, meaning 'sun.' More interestingly, you'll see Irish súil, which means 'eye' or 'hope'. Both sunna and súil, as well as every other word in this graph, are cognates: i.e., they derive from a common root in Proto-Indo-European, in this case *sóh₂wl̥, which meant 'sun.'

This graph offers a diachronic view---a view over the course of time---of the word 'sun' in the Indo-European language family. Most languages preserved the original meaning of this word. This is unsurprising, given the importance, discreteness, and obviousness of the sun in daily life. In the Gaelic languages, however, which we learned in Part I derive their word for 'sun' from the PIE root *gher- 'to shine', the root *sóh₂wl̥ developed into the word for 'eye', as in Irish súil, which took on the additional meaning of 'hope.' (Such etymological gems---in this case, an emerald---are the reason I love this field of study.)

Other diachronic graphs are more mundane. For example, the graph showing the development of PIE *méh₂tēr 'mother' confirms the intuitive notion that the term 'mother' (even more important to life than 'sun') changes very little across the Indo-European family. Even here, though, there is one point of interest: for whatever reason, in Albanian the inherited word 'mother,' motër, evolved to mean 'sister.' (You can make your own joke about Albanians here; I'm above it.)


The visualization suffers from a paucity of data: as I'm writing this there are a mere five concepts to be viewed Part I and only as many cognate sets have been entered for part II. I wrote the program to draw content from databases, which are trivial to expand, and adjust its layout accordingly. Sadly the research and data-entry for this project still take a significant amount of time. How much more time I'll be able to spend on this in the near future is uncertain. I do hope that the first sentence of this paragraph is rendered inaccurate by many future additions to the databases.

[—JN / Jan. 6, 2013]