The acoustic normalizer is used to pre-process the audio signal in order to remove additive noise, i.e. non-speech sound elements that can disrupt the recognizer training. An acoustic normalizer can help increase the robustness of the recognizer by making it less dependent on the purity of the speech signal. It will be created using approaches based on digital signal processing and/or deep neural networks. The dataset will consist of a speech corpus, which will also be developed in this project work package.
The syntactic normalizer can be used in both, text pre-processing and post-processing. Syntactic normalization is a process that transforms a text into a single canonical form that it may not have previously had. The recognizer views numbers, dates, acronyms, and abbreviations as examples of non-standard words, whose pronunciation varies in a given context. The transcription of the spoken language database will be formatted so the numbers and dates are spelled out, free of acronyms and abbreviations. This means that the recognizer will also return results in this format. The Gigafida 2.0 corpus, which is the intended source for the language model, will need to be re-tuned to match the recognizer results. Depending on the domain of the recognizer, the normalizer may also be useful in post-processing, where, for example, it will be able to convert spelled out numbers and dates into a numeric format. The syntactic normalizer will be created by using a rule-based approach, deep neural networks, or a combination of both. The updated Gigafida 2.0 corpus is expected to be used as the dataset.
A punctuator is a tool for basic punctuation mark placement in texts returned by a recognizer. A typical recognizer analyses an acoustic signal, recognizes phonemes, and then composes words from them. However, it cannot compose words into larger units such as sentences and clauses. The semantic value of these transcripts is therefore lower. The punctuator can enrich the recognized words with basic punctuation marks, such as commas, periods, question marks, and exclamations points, thus facilitating the semantic processing of the transcriptions. The punctuator is expected to be trained by using deep neural networks. The Gigafida 2.0 corpus is expected to be used as the dataset of the punctuator as well.
A phonemizer is a tool for converting a grapheme transcription into its corresponding phonemic transcription. It can serve as a method for adding missing words to a pronunciation dictionary during the recognition process. For a typical speech recognizer, the pronunciation dictionary is fundamental, as the it only recognizes words that are in the dictionary. During speech recognition we often come across words that are not yet written in the dictionary, therefore we need a process that allows for them to be added, either manually or automatically. The phonemizer is expected to be made with a combination of heuristic rules or a model learned by using neural networks.