Artikulua

Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case

Social networks like Twitter are increasingly important in the creation of new ways of communication. They have also become useful tools for social and linguistic research due to the massive amounts of public textual data available. This is particularly important for less resourced languages, as it allows to apply current natural language processing techniques to large amounts of unstructured data. In this work, we study the linguistic and social aspects of young and adult people’s behaviour based on their tweets’ contents and the social relations that arise from them.

The DISRPT 2019 Shared Task on Elementary Discourse UnitSegmentation and Connective Detection

In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection. In this paper we review the data included in the task, which cover 2.6 million manually annotated tokens from 15 datasets in 10 languages, survey and compare submit-ted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.

Neurona-sareetan oinarritutako euskararako korreferentzia-ebazpena

Lan honek euskararako korreferentzia-ebazpenean egindako lanari jarraipena ematea du helburu, korreferentzia-ebazpenerako neurona-sareetan oinarritutako sistema bat eraikiz. Horretarako polonierarako eraikitako sistema
bat hartu da abiapuntutzat, eta euskarara egokitu. EPEC-KORREF corpusetik abiatuta, aipamen-bikoteak
eta hauen ezaugarriak erauzi dira eta neurona-sarea entrenatu da aipamen-bikoteak korreferenteak ote diren
erabakitzeko. Jarraian, neurona-sarearen iragarpenetatik korreferentzia-klusterrak sortu eta ebaluatu egin dira.

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages

This paper presents a successful domain adaptation of a general neural machine
translation (NMT) system using a bilingual corpus created with captions for images in Wiki-
media Commons for the Spanish-Basque and English-Irish pairs.
Keywords: Machine Translation, Low-resource languages, Bilingual corpora, Language
resources from Wikipedia

Interpretable Deep Learning to Map Diagnostic Texts to ICD10 Codes

Background
Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD).

Orriak

RSS - Artikulua-rako harpidetza egin