Research

The general objective of the proposal is to spread knowlege frontiers in the area of Slovak spoken language analysis, recognition and synthesis. The goals of the project are oriented to the research in the area of speech technologies for telecommunication voice operated systems and services with a potential use of the research results in other areas such as automatic speech transcription, searching in speech and audio database records, speech-to-speech translation, semantic web, etc. The goals of the project, which are strictly aimed at Slovak language processing, are splitted into three thematic areas and they can be summarized as follows:

  1. The first thematic area is Automatic Speech Recognition, with the following objectives :
    1. Design and testing of robust acoustic models for telephone applications on connected SpeechDat-Sk and MobileDat-Sk speech corpuses. Usefulness of various state-of-the-art approaches of feature extraction and acoustic model training and adaptation will be studied from the particular aspect of robustness in telecommunication application
    2. Development (gathering and processing) of a speech corpus of broadcast news and talk shows records (minimally 150 hours) and a text corpus (minimally 150M words), collected from electronic resources, particularly from the Internet.
    3. Design of the first large vocabulary continuous speech recognition system in Slovak, which will be based on (i) new acoustic models designed from a corpus of broadcast records; (ii) new stochastic language models designed from a new collected text corpus, (iii) transcribed lexicon, designed from both text and speech corpuses.
    4. Some state-of-the-art approaches of acoustic and language model training will be studied from the particular aspect of Slovak language (pronounciation, phoneme/grapheme approach, morphology, etc.).
  2. The second thematic area is Speech Synthesis, where the goals are:
    1. Design of new method of concatenative synthesis from limited number of elements, based on new, enlarged set of elements (allophones, diphones, triphones etc.), new model of prosody and sophisticated rules of concatenation, which will make it possible to design synthesis achieving naturalness similar to that of corpus based synthesis from a large corpus.
    2. Design and testing of a new method of corpus synthesis with higher independency on input text (unlimited domain). The high volume of speech material in the corpus will guarantee high naturalness with application of prosody model already in the unit selection stage and with minimum signal post-processing to achieve excellent level of naturalness.
    3. Modeling of suprasegmental effects of Slovak language to propose more elaborated model of Slovak prosody for speech synthesis.
    4. Research and modeling of expressive phenomena in speech (personality, mood and emotions) and finding possibilities of expressive synthesis both in "concatenative" and corpus based synthesis.
    5. Design of a set of word and sentence subjective tests of speech quality and intelligibility in Slovak as well as for diagnostics of synthesized speech on the segmental and suprasegmental level. The research on possibility of objective testing of synthesized speech quality using indices generally used for natural speech quality measurement (PESQ, Speech Intelligibility Index, etc.) will be done.
  3. The third thematic area is Dialogue Modeling and Management with the following objectives:
    1. Applications of DDL languages (such as VoiceXML and SALT) and techniques based on plan-based and form-filling approaches, software agents and new more elaborated syntactic and semantic parsing techniques, with focus to solve selected problems of the challenging mixed-initiative dialogue.
    2. Studying of new approaches to testing and evaluation of spoken language dialogue systems based on automatic and user centered testing/evaluation ant its experimental verifying.