Name: Open Speech Corpora download for Linux
Brand: OnWorks
SKU: c577acc25ee5facd688e3862ebcfd897
Availability: OnlineOnly
Rating: 4.56 (2322 reviews)

This is the Linux app named Open Speech Corpora whose latest release can be downloaded as open-speech-corporasourcecode.zip. It can be run online in the free hosting provider OnWorks for workstations.

Download and run online this app named Open Speech Corpora with OnWorks for free.

Follow these instructions in order to run this app:

- 1. Downloaded this application in your PC.

- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 3. Upload this application in such filemanager.

- 4. Start the OnWorks Linux online or Windows online emulator or MACOS online emulator from this website.

- 5. From the OnWorks Linux OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 6. Download the application, install it and run it.

Download App Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

SCREENSHOTS

Open Speech Corpora

DESCRIPTION

Open Speech Corpora is a curated catalog of speech datasets intended to support research and development in automatic speech recognition, text-to-speech, and other speech technologies. The repository is organized as a set of tables that list corpora along with their languages, total hours, number of speakers, download links, and licenses, giving practitioners a quick way to find data that matches their needs. It emphasizes free and truly “open” datasets, favoring those released under Creative Commons or community-friendly data licenses, though it also lists corpora that are accessible for research and many commercial uses. The catalog covers well-known resources such as Mozilla Common Voice, Yesno, LJ Speech and numerous Nordic and parliamentary speech corpora, along with their license variants like CC-0 and CC-BY. It is actively maintained as a community resource: users are encouraged to propose new corpora via issues, and there is a backlog of datasets waiting to be integrated.

Features

Centralized catalog of speech corpora for ASR, TTS and related tasks
Detailed metadata including language, duration, speakers, download links and licenses
Emphasis on free and open datasets suitable for research and many commercial uses
Coverage of popular corpora like Common Voice, LJ Speech and multiple Nordic resources
Community-driven updates via issues and pull requests to keep the list evolving
License-based grouping (CC-0, CC-BY and more) to simplify compliance and dataset selection

Open Speech Corpora download for Linux

SCREENSHOTS

DESCRIPTION

Features

Categories