If I had to choose the two most famous and studied families around the world, I would doubtlessly choose the British Royal Family and the Indo-European family. They both have many things in common, apart from the huge amount of academic studies around them: they both spread around the globe taking over countries in Asia and Europe, nobody speaks like them anymore, they’re very, very old and nobody is still able to understand them completely.

However, aside from the creative similarities, the Indo-European is a family of languages and, science found out, it could be more than 8000 years old.

Indo-European languages, I.E., Indoerupeans languages tree

What is Indo-European?

The Indo-European family is big, with a lot of children and cousins from around the globe, in particular, the vast majority of Europe, the Iranian plateau, and the northern Indian subcontinent are home to this peculiar family.Then, due to colonisation, several of these family members —English, French, Portuguese, Russian, Dutch, and Spanish—have crossed the seven seas and spread around the world. Like proper family trees, the Indo-European one has numerous branches out of which eight are still in use: Albanian, Armenian, Balto-Slavic, Celtic, Germanic, Hellenic, Indo-Iranian, and Italic/Romance. Nevertheless, other nine less-known sub-branches have gone extinct throughout the centuries.

Some of these sub-families are richer than others: With over 100 million native speakers Spanish, English, Punjabi, Hindi–Urdu, Bengali, French, Portuguese, Russian, and German are the Indo-European languages with the greatest native speaker populations today —the cool cousins you only meet at christmas dinner and whom your mother always compares you with—, while many other languages, such as Flemish, Occitan and Catalan, that have a smaller number of native speakers, are at risk of disappearing.

Indo-European languages are spoken as first languages by 46% of the world's population, which stands for 3.2 billion people. According to Ethnologue, there are roughly 445 extant Indo-European languages, with the Indo-Iranian branch accounting for more than two-thirds (313) of these.

Linguistically reconstructed as Proto-Indo-European, all Indo-European languages are descended from a single prehistoric language spoken sometime between the Neolithic and Early Bronze Age. The Proto-Indo-European homeland, or the geographic area where it was spoken, has been the subject of numerous competing theories throughout the last two centuries. The majority of academics agree with the so-called Kurgan hypothesis, which holds that the homeland is the Pontic–Caspian steppe, a very cool name to mean today’s southern Russia and Ukraine, and is linked to the Yamnaya culture and other related archaeological cultures from the early 3rd to the 4th millennium BC. Very old stuff.

Pictographs or it didn't happen
P.Byrnes for the New Yorker, 2016

Indo-European had already given rise to a number of languages spoken throughout most of Europe, South and West Asia by the time the first written records were transcribed. Also, During the Bronze Age, written records of the Indo-European language emerged in the forms of Mycenaean Greek and the Anatolian languages Hittite and Luwian. The earliest known records are isolated Hittite words and names found in 20th century BC manuscripts from the Assyrian colony of Kültepe in eastern Anatolia and written in the unrelated Semitic language of Akkadian. Despite the lack of older written records, it is possible to reconstruct the culture and religion of the Kültepe people through the evidence found in the Proto-Indo-Europeans daughter cultures.

With the exception of the Afroasiatic family, which includes the Semitic languages and the ancient Egyptian, the Indo-European family has the second-longest documented history of any family. Therefore, studying it through historical linguistics becomes necessary. Interestingly enough, the same Historical Linguistic academic field was born in the 19th century for the sole and primary goal of studying Indo-European, and not the contrary. Imagine how many things they had to tell eachother to found an entire University department for that sole reason. Yes, Indo-European is huge, and through the study of it we are, every day more, finding out something new about ourselves as people and as members of a whole, unique family.

Tiny Hittite golden pendant - Met museum
Tiny Hittite golden pendant - Met museum

The latest Discovery

Despite a number of contentious theories suggesting otherwise, the majority of linguists today do not believe that the Indo-European language family shares any genetic ties with other language families. This disbelief may look like an obstacle for reaching a conclusion, but it actually stimulated the scientific community to dig deeper, and they dug deeper. Historic linguists, historians, scientists and phylogenetists, who are biologists that study evolutionary history, have teamed up to make sense of one of the greatest questions of human history: do most of our languages come from the same, ancient mother? The answer is, finally yes. The problem now is: where did she come from?

As mentioned above, the origin of the Indo-European languages has been a source of debate for more than 200 years, and it carries around many hypothesis, like the "Steppe" one we just saw, which suggests a beginning approximately 6000 years ago, and the "Anatolian" hypothesis, which suggests an even older origin connected to early agriculture approximately 9000 years ago, have recently dominated this argument. Unfortunately, despite the generous amount of debates and theories on the matter, plus the ever-growing and perfecting digital tools (AI, of course) entering the academic world, linguistics are stuck. A true, plausible dating of the origin of Indo-European is still to be found. However, scientists are not giving up, and recently found new clues that could help connect the dots.

How do they do that?

The front line of experts, linguists from the Max Planck Institute for Evolutionary Anthropology's Department of Linguistic and Cultural Evolution have brought together a global team of more than 80 language experts from around the world to create a new dataset of vocabulary from 161 Indo-European languages, 52 of which are very old or historical. Together with strict lexical data coding techniques, this new pack of words opened up a new, clearer path to a conclusion, which still lies under thousands of years of history and biological mess. But now scientists have a cake (good data), they only need to bake it with the oven (AI) at the right temperature (and interpret them properly). Has this metaphor gone too far? Probably. But if you managed to follow ’til here, you might as well contact us with your favourite metaphor and we will have a nice metaphor competition! Anyway, back to the science.

By ensuring that the datasets are functioning properly, the team has been able to determine that, contrary to what the Anatolian and Steppe theories claim, "Recently added ancient DNA data suggest that the Anatolian branch of Indo-European did not emerge from the Steppe, but from further south, in or near the northern arc of the Fertile Crescent — as the earliest source of the Indo-European family," according to Paul Heggarty, the study's first author.

The Steppe - From National Geographic Society.
The Steppe - From National Geographic Society

In order to account for the various ways that certain branches of Indo-European entered Europe through the later Yamnaya and Corded Ware-associated expansions, the study's authors put forth a new hybrid hypothesis for the origin of the Indo-European languages, with an ultimate homeland south of the Caucasus and a subsequent branch northwards onto the Steppe. According to Grey, "ancient DNA and language phylogenetics thus combine to suggest that a hybrid of the farming and Steppe hypotheses lies in the resolution to the 200-year-old Indo-European enigma."

The new study's implications say that the tree topology and branching order are most critical for the alignment with key archaeological events and shifting ancestry patterns seen in the ancient human genome data, aside from a refined time estimate for the overall language tree.

Why do we care so much?

Compared to the mutually exclusive earlier scenarios, this is a significant step forward towards a more tenable model that incorporates genetic, anthropological, and archaeological data.

Why do we care so much about the Indo-European language lore? It is always the same old story for me. The quest for identity is engraved in our consciousness not only as individuals who look for their place on this planet, but also as groups of people that want, need, seek to proudly belong to something, to somewhere. The engagement on the matter of Indo-European has increased over the centuries despite the infinite dead-ends and false starts. It is the symptom of the need of peoples around the world to find a connection with each other, a connection started probably nine thousands years ago. Plus, if we get to shed light on the Indo-European paradigm, I think our languages will get a whole new, epic attire.