Speech to Text is an automatic conversion of a spoken language into written text. In the direct variant, the computer instantly writes what the person is speaking. Indirectly, the computer first analyzes what is spoken and then converts it into written form. In order to be able to teach a Romanian computer at all, Radiotelevisiun Svizra Rumantscha (RTR) first got data that could be fed into the system. For this purpose, the media company used Lia Rumantscha (LR) dictionaries and publicly available texts such as newspaper articles.
The language department at LR supported the project with language-specific questions. For each Rumantsch Grischun term and language, the RTR provides approximately 30 hours of audio material with corresponding transcription. A total of 180 hours were split into small chunks by the RTR, in collaboration with students from the University of Education in Graubünden, so that the text could be assigned exactly to the corresponding audio sequences.
Digitization foundations
With this material, the experts at Recapp IT AG teach the Romanian language for the computer. So far, computer has already learned Rumantsch Grischun, Sursilvan, and Vallader. Turkeys will follow this summer and in 2022 RTR will complete the project with Sutsilvan. “Speech to text” is the technological basis for the digitization of the Romanian language. In the future it will be possible to transcribe community meetings or discussions in the grand council. In addition, RTR has already started the Text to Text project. This is about automatic translation of Romansh texts into German. (red)