It's the holiday season and everyone's busy doing holiday season stuff. Personally, I've been so busy not blogging I haven't had time to update my blog. I have had time in the past few days to work on an extension of the IE synonyms project I showed off a couple weeks ago. Once again, for the non-linguist it will require a bit of exposition. In particular, I'm going to use the pleasantly time-appropriate PIE root *ḱóym- 'home' to explain the complications in compending cognates that can arise from language contact.

The project

The grand goal of this work is something I have neither the time nor resources to compass fully, at least currently. I want to create a visualization of cognates by root. Imagine being able to click on a root in my last visualization to show a tree with cognates in a number of languages descending from this root. You could then click on the definition of one of these cognates to revert to the original visualization, the concept thesaurus, which would show the synonyms for this concept across languages. This would give a new set of roots in the right panel, which could then be clicked to see the cognate tree ... you get the picture. The reason this is uncompassable at present is that it requires such a prodigious dataset that I simply don't have the time or resources to compile at present (apart from aggregating all the research that has been done, it would require a vast amount of original research). Nevertheless, I'm working on a scope-limited prototype that should be finished in a week or two.

For today, I've sketched out a language tree viewer. What's different about this tree, compared with traditional trees like this one, is that in addition to showing genetic relationship between languages, I'm also showing influence resulting from language contact between languages.

Here's a glimpse of version 0.1 of this tree. It shows genetic relationships as a static 5x more influential than contact relationships, and only shows some contact relationships. This diagram is the Germanic language family, with Old to Modern French stuck in for the purposes of demonstration. In the final product, I hope to be able to highlight the etymological pathway taken by a given word in a given language.

The type of chart I'm using is called a Sankey diagram, which visualizes flow between nodes.

It's basically a tree, though, and it's Christmas colored.

It also looks kind of like a mutant menorah, which serendipitously has enough ends to cover us from the day I'm writing this until the end of Hannukah. Funny how that worked out.

P.S., You can click and drag the rectangles up and down to reposition them if you're bored.

Interlude to the linguisticky stuff

In my last post I listed the ultimate Proto-Indo-European root for a number of words. You could see how cognates in different languages were grouped together if they shared a common PIE root. The simplicity here is convenient for the purposes of visualization, but the actual etymologies of words are often not that simple. This is very much the same misleading (though convenient for many purposes) simplicity shown by the traditional language family trees, which show only genetic relationship between languages. The messy truth of human culture and society is that everything is constantly mixing and mashing and has been all throughout history. Whenever one linguistic population encounters another throughout history their languages intermix to some larger or smaller degree. In Europe, like pretty much everywhere on earth, groups of people frequently ran into other groups of people, whom they raped, pillaged, subjugated, assimilated, etc. Consequently, there was a huge amount of contact-induced language change. The simplest change language contact can effect on a language, and the one relevant for my project, is the adoption of lexical items. It should be obvious how these so-called loanwords can make the etymologist's job sometimes difficult. (Personally, and I think this is true for many others, this is what I find so appealing about the work.)

In the remaining words I'll demonstrate the intricacies of etymology that led me to build my tree in the way it did. I'll do so in plain English, using three common English words that happen to be cognates.

The remaining words.

The PIE root *ḱóym- 'home, village' is the source of several familiar English words, namely home, city, and hamlet. Interestingly, these words demonstrate a huge range of the meaning of the original PIE root: the physical home, the dwelling of one's family, to the home writ large, one's entire community. More interestingly, these words don't really sound alike. OK, home and hamlet are similar, but what about city? As I've hinted at already, the reason for this is that the words come from different sources.

In English there are a ton of doublets. One of them is usually a fancy word and one is usually more common. For example, tearful and lachrymose are synonyms, but whereas the former is sympathetic, you risk sounding like a pretentious ass if you use the latter. Not by chance, the former is a Germanic word, the latter is Romance.

Our affinity for Romance languages is historical and has a lot to do with the Normans, a French-speaking band of Norsemen from Normandy who subjugated the Old English and Norse-speaking populations of England in 1066 AD. As the language of the ruling class, French was the language of prestige in the land, and consequently the French words that worked their way into the Anglic-speakers' vocabularies registered as "fancy." Old English changed dramatically from its contact with French in this period, and also from its contact with Old Norse and other Germanic languages, and turned into Middle English.

For a taste of the difference between Old English and Middle English, compare Beowulf to Chaucer. You can't read Beowulf without a primer on Old English and some practice, while Chaucer you can read with limited assistance (and even relate to it, unlike Beowulf: "To you my purse and to none other wight ... I am sorry now that ye be so light!").

Now going back to home, city, and hamlet. How did these words get into English, and how are they all related to PIE *ḱóym-?

The first word, home, is very straightforward. This word comes to Modern English by way of Old English ham, which itself comes from Proto-Germanic *haimaz, which comes from PIE *ḱóym-. This is how simple the family tree suggests all etymologies are.

The second word, city is clearly not of the same progeny as home. They don't have a single sound in common! If you know another Romance language, you might have already noticed a cognate. In Spanish, a city is ciudad, in Italian you can say città, Romanian has cetate 'fortress', and French has the word cité. These words come from the ubiquitous Latin root cīv-, whence is derived not only English city but also citizen and civil. This Latin root cīv- was derived from PIE **ḱóym-. This should be less surprising when you remember that the c in Latin was actually pronounced as we pronounce "k" in English, not "s." Also, notice how to pronounce the "v" (which was actually more of a "w" in Latin) your lips in the same manner you do to pronounce an "m"---now meditate on how fathomable it is that the latter could have become the former over time.

The third word, hamlet, you may think to have the same history as home. It actually does not, and you can tell this by the very simple observation that the first syllable is "ham", not "home." There is a rule of thumb in historical linguistics called the Neogrammarian hypothesis. In short, this guideline says that sound change tends to be regular, affecting all instances of that particular sound in a language. In this case, if the a in Old English ham changed to an o to produce Modern English home, why wouldn't little villages be called *homelets, not hamlets? We can brush away any of the more complicated explanations for what happened with one simple one: the word hamlet is not from Old English. Indeed, hamlet comes from Old French hamelet.

The astute reader should now get a suspicion that I'm just making shit up. I assure you I'm not ... but why isn't hamlet more like city if it comes from French? Why isn't it civlet? The explanation is that this French word itself is not Romance, but Germanic. Ultimately, as you might have at first suspected, hamlet and home descend from the same Proto-Germanic root, *haimaz. The Romance-speaking people in Roman Gaul came into close contact with a Germanic-speaking people called the Franks. In fact, the situation was similar to what happened in England in 1066: the Franks conquered Roman Gaul, and the Frankish words diffused into what was developing into the French language. (And no, it's not a coincidence that "French" sounds so much like "Frank" ... the name "France" is in fact derived from the Germanic Franks who conquered it.) One of the words that made it into Old French was hamelet, which some time later entered into Middle English.

Now back to the original problem. In visualizing the Indo-European family, the traditional tree would only show mother-daughter links. English would be shown descending only from Middle English, from Old English, from Anglo-Frisian, from West-Germanic dialects, from Proto-Germanic, from Indo-European. This has its uses, but it does not do any justice to the complexity of human history. In my grand vision I see visualization that highlights the particular etymological pathway a word in a given languages takes: you'd be able to see how city and hamlet both come from French, but that hamlet comes to French from haimaz, whence also comes English home, while city comes from Latin, and only very distantly are these words related in the Proto-Indo-European root *ḱóym-.