Terminology portal

Work package 5

In the fifth work package, we will create a terminology portal that will feature various terminological resources, offer an openly accessible tool for term extraction from specialized corpora, as well as the server infrastructure needed to create new terminological resources. The portal will also host a consultancy centre regarding terminological issues, while additional guidelines with sample databases and clear instructions for the independent production of terminological resources will be also made available.

Goals

  • We will create a terminology portal that will include a comprehensive search engine for all existing Slovene terminology resources and all resources that will be developed in the project framework.
  • We will develop a tool for term candidates’ extraction from standard written texts. It will be possible to import this tool into an editor and use it to create a terminological resource.
  • We will develop an editor for terminological resource creation and accompanying guidelines and instructions, which will allow any user to create their own resources.
  • We will set up a consultancy centre for resolving terminology-related questions and provide an up-to-date publication of all answers.
  • We will prepare a plan for future development and upgrade of the terminology portal after the project is over.

Terminology portal

The terminology portal and its functionalities will be developed and available to users as an independent, openly accessible web service, which will include as many resources as possible.

The main components of the terminology portal are a search engine for all integrated resources, a specialized corpora term candidates’ extractor, a concordancer for specialized texts, a term annotator for scientific texts, a terminology resource editor, a terminology consultancy, and auxiliary modules.

The installation package will also allow for the installation of a user’s own instance of the portal, which will be able to connect with others, forming a “conglomerate" of portals, within which the portals will be able to interchange data.

The terminology portal will be designed and developed with the aim of providing the best possible user experience and taking into account people with special needs, e.g. blindness and low vision.

Term search

Searching the terminological resources for information is a basic function of the portal that will be used by most users. Therefore, the ease and speed of the search process itself and the appropriate display of the results is of utmost importance.

The search process will have multiple stages; search – enter a string in the search field; display of results with basic data; display of the entire dictionary entry, namely: on the terminology portal for all purchased and newly created resources, on the original website, e.g. Terminologišče or Termania, for not-purchased resources.

Term candidates extractor

The foundation for building each dictionary is a subject heading, i.e. a list of entries and terms. With the help of specialized corpora, which are available for an ever-increasing number of scientific fields, it will be possible to quickly extract terminological candidates with the help of extraction tools, which will later be processed by terminologists and experts. This simplifies and shortens the initial phase of creating a terminology dictionary.

In order to work, the extractor needs basic language technologies for Slovene, namely tokenization, lemmatization and morphosyntactic annotation. For the statistical evaluation of the termhood, lemmatized frequency lists (n-grams) of the reference corpus are needed – in our case it will be Gigafida 2.0.

The extraction module will contain two basic tools. The first will extract terminological candidates in the form of a list that a user will be able to furtherly process, and the second tool will highlight the terms in the texts. When using the automatic terminology extraction system, the user will be able to include publicly available external resources from the national open access infrastructure or use only their own texts.

The module for term candidates’ extraction will be available on the terminology portal, while the extractor itself will be available as either an online service or as a local download via GitHub that will include the code and all instructions.

Terminological resources creation editor

To facilitate the terminology compiling and editing process, we will develop an online terminological resources creation editor. Our aim is for all dictionaries to have the same dictionary entry structure. To achieve this, we will first convert all the acquired existing dictionaries into a single format with a single procedure, taking care to lose as little information as possible. For all newly created dictionaries on the portal, we will have several formatting templates, from a simple template with basic elements, to a more complex format with many elements that are very common in dictionaries and which we want to include as well.

In designing and creating the editor, we will focus on non-professional lexicographer users; therefore, we will provide a tool that is easy to use but still offers most of the features needed to compile and edit a dictionary.

The terminological resources can later be integrated in other language tools and services, e.g. into a machine translation engine, other terminology management tools, or a speech recognizer. The resources will also allow for user cooperation with different user roles and enable sharing of multimedia data and links of external sources. The application, which will be accompanied with detailed documentation, will comply with modern terminological standards.

Consultancy on terminology issues

The terminology consultancy will allow users to pose terminology related questions via a web interface that will encourage the user to formulate the question as accurately as possible, indicating the source, context, possible foreign language equivalents etc. We will provide the infrastructure for the most streamlined service operation – a web interface with an archive of already resolved issues. The queries will be answered by terminologists who will also consult external experts if necessary. The Terminology Consultancy Centre, which is already operating successfully on the bwebsite, the SASA (ISJFR ZRC SAZU) Terminology Section website, will also be integrated into the terminology portal.

Terminology resources and sample dataset production guidelines

The guidelines with a short and understandable description of all the portal’s options will help terminology users at all stages of resource development. These users are e.g. field experts, translators, proof-readers and others, who want to create their own terminological resource. We will also take into account users with special needs. We will provide practical examples of the instructions with sample datasets – we will prepare a terminological dictionary in the field of taxes, a sample collection of dictionary entries in the field of computer science and a specialized corpus of linguistic texts, as well as instructions for using application programming interfaces.

Learn more about other Work packages

Maintenance of the infrastructure centre for language resources and technologies

Maintenance of the infrastructure centre for language resources and technologies

Language resources

Language resources

Speech technologies

Speech technologies

© 2020. All rights reserved

Concept and implementation: ENKI, d.o.o. Legal notice Cookies