MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task

The aim of this paper is to present our approach (MEDIA)on the CLEF-2020 eHealth Task 1. The task consists in automatically assigning ICD10 codes (CIE-10, in Spanish) to clinical case documents,evaluating the prediction against manually generated ICD10 codifications. Our system took part in two different subtasks: one corresponding to Diagnosis Coding (CodiEsp-D) and the other to Procedure Coding(CodiEsp-P). We approached the coding task as a two step system; a first step consisting of carrying out the named entity recognition (diagnoses and procedures) and a second step for assigning the right ICD10 code to the given entity (diagnosis or procedure). For the first step, namely the medical entity recognition, we employed a transfer learning strategy over pre-trained Language Models by tuning them to the Named Entity Recognition task. The second step was dealt with edit distance techniques. We achieved our best results combining static and contextual word embed-dings of Wikipedia and Electronic Health Records (∼100M words), with a Mean Average Precision (MAP) of 0.488 and 0.442 for diagnoses and procedures, respectively. Keywords:Neural Networks, Levenshtein Distance, ICD Coding.
Authors (IXA members): 
Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola
Public documents: 
Publication place: 
Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.
ISBN edo ISSN (aldizkari, kongresu, liburu edo liburu atalak): 
1613 - 0073