WebMaus: Automatic Segmentation and Labelling of Speech Signals over the Web


The web application WebMAUS allows the user to automatically align speech recordings to their corresponding text form. Two input files need to be uploaded by the user: a media file containing a recorded speech signal and a file containing some textual encoding of the words spoken in the recording. In case the latter is a simple text, the contents are text-normalized and tokenized into a chain of words. The application then produces a phonological pronunciation encoding of the content in SAM-PA, that basically reflects the standard citation pronunciation of the content. Based on this phonological form, a statistically weighted graph of all possible realisations (pronunciation variants) within the selected language is created based on a machine-learned expert system. Finally this graph is aligned to the speech signal using standard techniques from automatic speech recognition. The result of this process is an orthographic and a phonetic alignment (segmentation and labelling) of the recorded speech, which is then rendered into the desired target format (BPF, Emu, TextGrid) and returned to the user via the web browser. 

Tool type

Tool for creating own tools and resources

Tool task

speech recognition, phonetic segmentation & labeling, time-alignment

Key words

multi-lingual, service, speech processing, web-application

Research domain

Phonetics, Phonology, Speech Recognition




CLARIN centre

Bayerisches Archiv für Sprachsignale, München

Contact person

Thomas Kisler



Similar to