Testuen analisia

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification.

Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages.

Distância diacrónica automática entre variantes diatópicas do português e do espanhol

O objetivo deste trabalho é aplicar uma metodo-
logia baseada na perplexidade, para calcular automa-
ticamente a distância interlinguística entre diferentes
períodos históricos de variantes diatópicas de idiomas.

Testu-corpusen informazio morfosintaktikoaren etiketatze automatikoa hizkuntz ezagutzan oinarriutz: zenbait arazo, hainbat erronka

Maila morfosintaktikoan etiketatutako euskarazko corpusen desanbiguatze-lanetan urtetan aritu ondoren, bide horretan topatutako hainbat zailtasunen berri emango dugu artikulu honetan eta, horrekin batera, hainbat irizpide birplanteatzeko beharra ere azalduko dugu. Testuingurua hizkuntzalaritza konputazionala izanik, guk erabilitako metodologia erregeletan oinarritutako gramatikena da, hau da, informazio linguistikoa baliatuz aurrera eramaten dena.

Moreus+: Word Parsing in Basque beyond Morphological Segmentation

This work describes the formalization of a word structure grammar that represents the complex morphological and morphosyntactic information embedded within the word forms of an agglutinative language (Basque), giving a comprehensive linguistic description of the main morphological phenomena, such as affixation, derivation, and composition, and also taking into account the modeling of both standard and non standard words. We have identified the relevant issues to be addressed in the representation of such a grammar.

Pages

Subscribe to RSS - Testuen analisia