Ramblings on linguistics: Automated reconstruction of ancient languages using probabilistic models of sound change

Today's paper is Automated reconstruction of ancient languages using probabilistic models of sound change by Alexandre Bouchard-Côté, David Hall, Thomas L. Griffiths, and Dan Klein. published in Proceedings of the National Academy of Sciences in 2013.

I've previously talked about the comparative method, which is quite a slow and painstaking method of reconstructing proto-languages. This paper, as the title suggests, demonstrates how using large numbers of daughter languages and some models of sound changes, Proto-Austronesian can be quite accurately reconstructed.

A note about Austronesian

Austronesian is a language family consisting of languages from Taiwan, Indonesia, Papau New Guinea, the Philippines and other island and south-east Asian nations. Despite the name, no Australian languages - those are in the Australian language family. Ethnologue lists 147 language families, which is quite surprising if you're used to Indo-European and expect there to be 1 or 2 language families per continent.

The model

The model takes as inputs large cognate sets. It uses these to construct the ancestors of these words, constructing also the sound changes that take place along the way, and therefore how and when the ancestors branched from each other.

Each new cognate set updates the model, as the sound changes must be equally applicable to all of them.

It can also take as inputs word lists that are not designated cognates, and uses their glosses (i.e. their stated meanings in a common language) to infer which are cognates. Without this, the 'automatic' method would still require a large amount of manual attention. Also, since cognates are by definition descendents of a single ancestor, using them as inputs for a tree introduces a certain amount of circularity.

Results

"We have developed an automated system capable of large-scale reconstruction of protolanguage word forms, cognate sets, and sound change histories."

They also use this model to show that the Functional Load Hypothesis is valid. It "claims that the probability that a sound will change over time is related to the amount of information provided by a sound". Which seems like common sense, but had apparently been difficult to prove without comparison across hundreds of languages.

References

Bouchard-Côté, Alexandre, David Hall, Thomas L. Griffiths, and Dan Klein. "Automated reconstruction of ancient languages using probabilistic models of sound change." Proceedings of the National Academy of Sciences 110, no. 11 (2013): 4224-4229.
http://www.pnas.org/content/early/2013/02/05/1204678110.full.pdf