Tutorial on computational grammars in HPSG formalism using the Grammar Matrix

Computational grammars elaborated manually based on linguistic principles have been proven effective in several industrial-level applications that require textual comprehension, in tasks such as automatic translation, question resolution and information extraction.

This type of grammar is a complement to the statistical approaches based on syntactically annotated corpora, the so-called treebanks. The annotation of a corpus by means of a computational grammar ensures the depth and consistency of the analyzes, allowing the knowledge of specialists to be automatically applied in the annotation of a large volume of sentences.

One of the most commonly used formal grammatical theories for the elaboration of grammars of this type is HPSG. The main grammars of wide coverage implemented in this formalism are the English Resource Grammar (ERG), the Japanese JACY and the German grammar of the DFKI (German Artificial Intelligence Research Center), the result of an effort of more than a decade of individuals or small groups.

The computational modeling of the grammatical phenomena of a language in this formalism presupposes the mastery of the description language TDL (Type Description Language), constituting a complex programming task, object of grammar engineering.

The Grammar Matrix, which has been developed since the 2000s at the University of Washington by Emily M. Bender and colleagues, makes it possible to reuse solutions for implementing the referred grammars for the construction of new grammars, dispensing knowledge of the TDL language.

The system has an interface in the form of a questionnaire based on extensive typological research, which covers some of the main grammatical phenomena of the world's languages. To build a computational grammar for a given language, the user only needs to specify the particularities of the language in relation to a series of grammatical parameters, such as word order, types of morphosyntactic categories, etc. as well as describing the properties of lexical items. This initial grammar can be expanded later manually.

In this tutorial, we present the fundamental linguistic concepts necessary to understand and use the questionnaire, as well as minimal notions of HPSG theory. The notions will be exemplified through the construction of mini-grammars from English and Latin, languages ​​that differ structurally in a very significant way.

The tutorial will conclude with the presentation of applications and tools for using the ERG. To accompany the tutorial, we recommend prior installation of the LKB-Fos parser integrated with the Emacs editor and acquisition of a minimum familiarity with both systems, although this is not strictly necessary.



  • Leonel Figueiredo de Alencar - Full Professor at the Federal University of Ceará and Visiting Professor at the School of Applied Mathematics of the Getulio Vargas Foundation

  • Alexandre Rademaker - Professor at the School of Applied Mathematics at Fundação Getulio Vargas and Researcher at IBM Research





Please register through the link below:


Site: http://arademaker.github.io/blog/2021/04/05/grammar-matrix.html



  • 12.04.2021 15:00 - 16:30 (LF de Alencar): Fundamental linguistic concepts: constituent structure, X-bar theory, universal grammar, grammatical relations, morphosyntactic categories, control, raising etc. Elementary notions of HPSG: typed outline structure, type hierarchy, unification, etc. Mini-grammar English 1.

  • 19.04.2021 15:00 - 16:30 (LF de Alencar): Minigrammatics English 2. Minigrammatics Latin 1.

  • 26.04.2021 15:00 - 16:30 (LF de Alencar): Minigramática English 3. Minigramática Latin 2.

  • 03.05.2021 15:00 - 16:30 (LF de Alencar): Mini-grammar English 4. Mini-grammar Latin 3. Limitations of the Grammar Matrix and how to get around them. Concrete examples of manual modifications to the TDL code.

  • 10.05.2021 15:00 - 16:30 (A. Rademaker): English Resource Grammar, applications and usage tools.