Adapting NMT to caption translation in Wikimedia Commons for low-resource languages

This paper presents a successful domain adaptation of a general neural machine translation (NMT) system using a bilingual corpus created with captions for images in Wiki- media Commons for the Spanish-Basque and English-Irish pairs. Keywords: Machine Translation, Low-resource languages, Bilingual corpora, Language resources from Wikipedia

Interpretable Deep Learning to Map Diagnostic Texts to ICD10 Codes

Background Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD). Objective

Literal occurrences of Multiword Expressions: rare birds that cause a stir

Multiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one’s influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest that literal occurrences should be considered in both semantic and syntactic terms, which motivates their study in a treebank.

Weighted finite-state transducers for normalization of historical texts

This paper presents a study about methods for normalization of historical texts. The aim of these methods is learning relations between historical and contemporary word forms. We have compiled training and test corpora for different languages and scenarios, and we have tried to read the results related to the features of the corpora and languages. Our proposed method, based on weighted finite-state transducers, is com- pared to previously published ones. Our method learns to map phonological changes using a noisy channel


RSS - Aldizkaria-rako harpidetza egin