Supporting the digital sustainability of the Hungarian language

Financer institutionHungarian Academy of Sciences

IDTMNYNP 3
Domestic tenderInstitutional tender

The current project encompasses research and development activities within four subprojects, in which the Research Institute for Linguistics, recognized as a Center of Excellence by the Hungarian Academy of Sciences, plays a leading and distinctive role. These subprojects are as follows:

  1. Hungarian National Corpus 3.0
  2. Spelling Advisory Portal 2.0
  3. Digitalization of the archival collection of the Hungarian Language Great Dictionary
  4. Corpus and Knowledge Base of Ob-Ugric Languages

The Research Institute has always considered it part of its mission to support the Hungarian language and its usage with modern technological tools. The Language Technology Research Group operating within the Institute, with the support of the Hungarian Academy of Sciences, has created several linguistic resources and digital tools that have proven popular among both professional researchers and the general public. As part of this project, we aim to renew the Hungarian National Corpus and the Spelling Advisory Portal operating under the address spelling.mta.hu.

In addition to the two previously initiated projects within the Research Institute, an important part of the project is the complete digitization of the material from the archival collection of the Hungarian Language Great Dictionary, primarily serving cultural heritage protection. However, this effort also opens up opportunities for further lexicological research. Following publication on the internet, the archival collection will be accessible to both lay and professional audiences, not to mention that lexicographers working on the Great Dictionary project will have easier access to the material than before.

The fourth subproject involves a comprehensive description of languages: The Oxford Guide to the Uralic Languages (Bakró et al. 2022), in which the researchers from the Research Institute present the Ob-Ugric languages, laying a solid foundation for language analysts’ development efforts.

All four subprojects are carried out by an interdisciplinary team under the direction of researchers from the Research Institute for Linguistics, comprising linguistic and language technology experts, as well as computer scientists, software developers, and web developers, working together alongside the planned investments outlined in the table below. The planned duration of the work is a total of 48 months.