1403 - Ad Lingua: Text Classification Improves Symbolism Prediction in Image Advertisements
Understanding image advertisements is a challenging task, often requiring non-literal interpretation. We argue that standard image-based predictions are insufficient for symbolism prediction. Following the intuition that texts and images are complementary in advertising, we introduce a multimodal ensemble of a state of the art image-based classifier, a classifier based on an object detection architecture, and a fine-tuned language model applied to texts extracted from ads by~OCR. The resulting system establishes a new state of the art in symbolism prediction.
2088 - Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models
In biomedical research, the entity linking problem is known as Medical Concept Normalization (MCN). Medical concepts may have different types (e.g., drugs, diseases, or genes/proteins) and may be retrieved from different single-typed ontologies. A recurring problem, which arises with supervised models, is how to reuse trained models for a different purpose; this requires coding to specific terminology. In this work, we seek to answer the following research questions: Do test sets of current benchmarks lead to an overestimation of performance? How do surface characteristics of entity mentions affect the performance of the BERT-based baseline? Does a model trained on one corpus work for the linking of entity mentions of another type or domain in the zero-shot setting?
KFU NLP Team at SMM4H 2020 Tasks: Cross-lingual Transfer Learning with Pretrained Language Models for Drug Reactions
This presentation describes neural models developed for the Social Media Mining for Health (SMM4H) 2020 shared tasks. Specifically, we participated in two tasks. We investigate the use of a language representation model BERT pre-trained on a large-scale corpus of 5 million health-related user reviews in English and Russian. The ensemble of neural networks for extraction and normalization of adverse drug reactions ranked first among 7 teams at the SMM4H 2020 Task 3 and obtained a relaxed F1 of 46%. The BERT-based multilingual model for classification of English and Russian tweets that report adverse reactions ranked second among 16 and 7 teams at two first subtasks of the SMM4H 2019 Task 2 and obtained a relaxed F1 of 58% on English tweets and 51% on Russian tweets.