The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words based on the contexts in which they are observed. The mainstream approach is to represent meanings as vectors (such as Word2Vec embeddings, or contextualized BERT embeddings). However, vectors do not provide a natural way to talk about basic concepts in logic and formal semantics, such as truth and reference. While there have been many attempts to extend vector space models to support such concepts, there does not seem to be a clear solution. In this talk, I will instead go back to fundamentals, questioning whether we should represent meaning as a vector.
I will present the framework of Functional Distributional Semantics, which makes a clear distinction between words and the entities they refer to. The meaning of a word is represented as a binary classifier over entities, identifying whether the word could refer to the entity (informal semantic terms, whether the word is true of the entity). When learning such a classifier from a corpus of text, the challenge is that the entities themselves are not observed, and so I define a probabilistic graphical model where entities are latent variables. This graphical model provides a natural way to model logical inference, semantic composition, and context-dependent meanings, where Bayesian inference plays a crucial role. To make inference tractable, I use amortized variational inference with a graph-convolutional network. The model is trained on WikiWoods, a parsed version of the English Wikipedia. I will discuss results on semantic evaluation datasets, indicating that the model can learn information not captured by vector space models like Word2Vec and BERT. I will conclude with an outlook for future work, including joint learning from text, ontologies, and grounded data such as images.
*Texto informado pelo autor.
Após o registro, você receberá um e-mail de confirmação contendo informações sobre conexão no seminário.
I began my undergraduate studies as a mathematician at Trinity College, Cambridge, before switching to a master's in computer science, focusing on computational linguistics. I then spent one year studying at Saarland University and working at DFKI (the German Research Centre for Artificial Intelligence), before returning to Cambridge to pursue a Ph.D. under the supervision of Ann Copestake, which I completed in 2018. My thesis was awarded Honourable Mention (top 3) for the 2019 E.W. Beth Dissertation Prize, and Highly Commended (top 3) for the 2019 CPHC/BCS Distinguished Dissertation Award. I am currently a Research Fellow at Gonville & Caius College, a Departmental Early-Career Academic Fellow at the Department of Computer Science and Technology, and an Executive Director of Cambridge Language Sciences. I also enjoy ballroom and latin dancing (which includes samba, but the Europeanised version, not the Brazilian version!)