Aarhus University Seal

CS Colloquium - Ira Assent: Briefly confused? It's not you, it's the context

Abbreviations are convenient when we refer to a concept again and again, in particular if it is a long or complicated term. Naturally, due to the brevity of typical abbreviations (often just two or three letters), abbreviations are typically ambiguous.

Info about event

Time

Friday 5 October 2018,  at 15:15 - 16:00

Location

Building 5335, room 016 (Peter Bøgh Auditorium)

Abstract:

Abbreviations are convenient when we refer to a concept again and again, in particular if it is a long or complicated term. Naturally, due to the brevity of typical abbreviations (often just two or three letters), abbreviations are typically ambiguous. As humans we are very good at implicitly translating the abbreviation into the correct long form if we are familiar with its context or when we have been introduced to the abbreviation earlier in the text. For computers, however, the situation is not that straightforward. In natural language processing (NLP) applications such as automatic grouping of documents according to topic (e.g. retrieving research articles relevant for a study), it is therefore important to solve the problem of abbreviation disambiguation.

In this talk, we present an approach that is based on capturing the context of abbreviations and their long forms by learning word embeddings from shallow neural networks. We conclude the talk with an overview of our most recent research contributions in the field of text mining.

Inaugural lecture:

1 August 2018 Ira was appointed professor in data-intensive systems at the Department of Computer Science. Ira heads the Data-Intensive Systems research group at the Department of Computer Science at Aarhus University. Her research interests are in data management and data mining, with a special focus on efficiency and scalability to large data volumes.

The lecture will be followed by an informal reception.

Official Invitation