Work packages

The project aim is to meet the needs for computational tools and services in the field of language technologies for the Slovenian language, to be used by research organizations, companies, and the general public.

The end products will help to communicate, collaborate, do business, share knowledge, participate in social and political discourse in a user-friendly way, and contribute to overcoming linguistic hurdles.

Language resources

In the first work package, we will upgrade Slovenian text corpora and the lexicon of word forms. We will renew training datasets and procedures for automatic linguistic annotation of modern Slovene. The results will be refreshed and expanded language resources available to both the user community and for the purposes of machine learning. The developed procedures and tools will make for a faster and easier update of Slovenian corpora in the future.

Speech technologies

It is planned to create a speech database, which will be the foundation for the development of a general speech recognizer, support tools and procedures to further develop a robust general and specialized recognizer, a web portal containing support tools and recognition models. Additionally, we will prepare guidelines for the creation of a real-time recognizer for the education domain.

Semantic resources and technologies

Objectives include the creation of a central open-access digital dictionary database, which combines different types of linguistic data for Slovene, the automatic creation of a semantic network, development of resources and tools for word sense disambiguation and semantic shift recognition, tools for automatic summarization and question answering, and the creation of corpora for performing semantic analyses.

Machine translation

The goals of this work package include the deployment of a reference machine translation engine, the development of support tools and definition of evaluation methods, the testing of alternative neural machine translation (NMT) frameworks, the development of NMT models and their upgrades in light of future updates to the corpus of translations, the development of a web portal, which will host the translation service, the preparation of a long-term plan for developing a MT engine for the education domain, a plan for further development of the general MT engine, and the collection of texts for the corpus of translations.

Terminology portal

This work package consists of the creation of a terminology portal with an accompanying search engine capable of searching through terminological resources, an online concordancer for the analysis of specialized corpora, tools for terminological candidates extraction, an online terminological resources editor, various guidelines and instructions for compiling terminological resources with sample databases, the deployment of a terminology consultancy centre, which will help users resolving queries, the answers to which will be readily published. We will also prepare a plan for future development and upgrading of the terminology portal after the completion of the project.

Maintenance of the infrastructure centre for language resources and technologies

The goals are to provide existing and upgraded CLARIN.SI infrastructure services, to ensure the development and maintenance of XML schemas, the distribution of language resources and tools, to ensure the acquisition of existing language resources, and to inform end users about the project results.

About the project

Partners

About the project

With the project titled Development of Slovene in a Digital Environment (DSDE), which is financed by the Slovenian Ministry of Culture, and the European Regional Development Fund, Slovenia has recognized the importance of developing modern language technologies for the Slovene language.

More about the project

Concept and implementation: ENKI, d.o.o. Legal notice Cookies