GlobalPhone

Development and evaluation of large speech processing systems in the most widespread languages of the world. GlobalPhone is designed to be uniform across languages with respect to the amount of text and audio data per language, the audio data quality (microphone, noise, channel), the collection scenario (task, setup, speaking style etc.), and the transcription conventions and supplies an excellent basis for research in the areas of (1) multilingual speech recognition, (2) rapid deployment of speech processing systems to new languages, (3) language and speaker identification tasks, (4) monolingual speech recognition in a large variety of languages, as well as (5) comparisons across major languages based on text and speech data. To date, the GlobalPhone corpus covers 19 languages Arabic, Bulgarian, Chinese (Mandarin and Shanghainese), Croatian, Czech, French, German, Japanese, Korean, Portuguese, Polish, Russian, Spanish, Swedish, Tamil, Thai, and Turkish. In each language about 100 adult speakers were recorded with close-speaking microphones when reading about 100 sentences each. The entire corpus contains over 300 hours speech spoken by more than 1,500 native adult speakers.

To date GlobalPhone covers the following 19 Languages:

Arabic (Tunesian und Palestine)
Bulgarian
Chinese (Mandarin and Shanghai dialect)
Croatian
Czech
French
German
Hausa
Japanese
Korean
Portuguese (Brazil)
Polish
Russian
Spanish (Costa Rica)
Swedish
Tamil
Thai
Turkish
Vietnamese

More information can be found on our publications page and on the GlobalPhone webpage!

Updated by: Eric Schädler

RSS

Print page

Search

GlobalPhone

GlobalPhone (since 1995)