Challenges For Information Extraction In The Industry

Sede FGV - Sala 317

Sobre o Evento

  • Quem: Alexandre Rademaker
  • Onde: Praia de Botafogo, 190 - sala 537
  • Quando: 30 de Agosto de 2018 às 16:00h

Increasingly, governments, corporations, and scientific organizations need to extract complex information from highly technical documents expressed in natural languages with a specialized lexicon, non-standard syntax, and domain-specific semantic interpretations. While linguistic resources exist in some specialized domains, they are mostly unavailable in technical fields such as legal or Oil&Gas. Furthermore, developing sufficient corpora for these domains can be expensive. In this presentation, I will describe our experiments in information extraction in technical domains. Our pipeline includes deep parsing, word sense disambiguation using expanded wordnet and ontologies; it combines statistical and rule-based methods. We also have some words to share about the previous use of dependency parsers using Universal Dependencies (UD) and human annotation of entities and relations. We conclude with future works, possible ideas for improving the results and some comments about the many different ways mathematics is related to language and meaning.

A presença é gratuita e não exige confirmação. A FGV não permite a entrada de pessoas vestindo bermuda e/ou chinelos.


Alexandre Rademaker

Research Staff Member in the Natural Resources Solutions group at IBM Research – Brazil

Alexandre Rademaker is a Research Staff Member in the Natural Resources Solutions group at IBM Research – Brazil. Alexandre is also an adjunct professor at the Applied Mathematics School of Getulio Vargas Foundation (EMAp/FGV). Alexandre holds a Ph.D. (2010) in Computer Science from Pontifical Catholic University of Rio de Janeiro (PUC-Rio) and M.Sc. (2005) in Computer Science from Fluminense Federal University, and B.Sc. (2001) in Computer Science from Federal University of Rio de Janeiro. During his Ph.D., Alexandre was an international fellow at Microsoft Research and SRI International. At MSR, in 2008, he worked with the Z3 SMT Solver team (Leonardo de Moura and Nikolaj Bjørner) developing a distributed environment for testing and optimizations of Z3. At SRI International, in 2009, he worked under the supervision of Natarajan Shankar in different research projects including the formalization of ALC deduction systems in PVS. Alexandre participated in several research projects like MIST (using natural language processing and description logics for Knowledge modeling), ANUBIS (database consistency check) and Ontology and Context (investigating the problem of ontology alignment). In his thesis, we proposed new deduction systems for description logics under the supervision of Edward Hermann Haeusler, published by Springer with the title “A proof theory for Description Logics” in 2012 as a book of the series Springer Briefs. Alexandre is the author/co-author of more than 70 papers published in peer-reviewed journals and international conferences. His areas of expertise and interesting are logic, proof theory, knowledge representation and reasoning, type theory, computational semantics, lexical resources and computational grammars.


