Measuring diachronic language distance using perplexity. Application to English, Portuguese and Spanish.

The objective of this work is to set a corpus-driven methodology to quantify automatically diachronic language distance
between chronological periods of several languages. We apply a perplexity-based measure to written text representing
different historical periods of three languages: European English, European Portuguese and European Spanish. For this
purpose, we have built historical corpora for each period, which have been compiled from different open corpus sources

Weighted finite-state transducers for normalization of historical texts

This paper presents a study about methods for normalization of historical texts. The aim of these methods
is learning relations between historical and contemporary word forms. We have compiled training and test
corpora for different languages and scenarios, and we have tried to read the results related to the features
of the corpora and languages. Our proposed method, based on weighted finite-state transducers, is com-
pared to previously published ones. Our method learns to map phonological changes using a noisy channel


Subscribe to RSS - Morphology