The repository enables long-term safe storage of language resources and tools. It is the second repository in Slovenia to obtain the Core Trust Seal certificate and is also certified as a CLARIN type B centre. Its volume currently exceeds 200 entries, of which 140 include data for the Slovenian language, which are crucial for computational linguistics.
Work on the repository includes maintenance of software and hardware, care for the undisturbed operation of the system and editorial work on new entries. Within the framework of the project, the editorial process will be extended to the validation of the data itself, as its formatting will have to correspond to the developed schemas. In addition to formal validation, resources will also need to be qualitatively evaluated.
Authorized editors will ensure that authors' entries comply with the requirements of the repository, in terms of completeness and consistency of metadata and compliance with open standards and good practices in data formatting. The evaluation process will be the basis for an entry to be accepted into the repository; in case of remarks, the entry will be returned to the authors with detailed guidelines for its improvement, or the technical deficiencies will be resolved by the CLARIN.SI staff in agreement with the authors.
CLARIN.SI also offers two online concordancers, i.e. powerful corpus analysis tools that are primarily useful for linguists. They currently offer access to 75 corpora in 27 languages; in total they contain over 15 billion words. All corpora to be made publicly available in the repository will be further converted to a vertical format, which includes the development of conversions, and this format then serves as a basis for including the corpora in the CLARIN.SI concordancers. By doing so, the corpora will also be accessible to linguists for corpus analyses.
In addition, we will train new collaborators accordingly, while the staff will be available to answer user questions related to the project.