Name: Step-Audio-EditX download for Linux
Brand: OnWorks
SKU: b6f56ec8ea1389c34a1455e70c613478
Availability: OnlineOnly
Rating: 4.31 (2318 reviews)

This is the Linux app named Step-Audio-EditX whose latest release can be downloaded as Step-Audio-EditXsourcecode.zip. It can be run online in the free hosting provider OnWorks for workstations.

Download and run online this app named Step-Audio-EditX with OnWorks for free.

Follow these instructions in order to run this app:

- 1. Downloaded this application in your PC.

- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 3. Upload this application in such filemanager.

- 4. Start the OnWorks Linux online or Windows online emulator or MACOS online emulator from this website.

- 5. From the OnWorks Linux OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 6. Download the application, install it and run it.

Download App Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

SCREENSHOTS

Step-Audio-EditX

DESCRIPTION

Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. This allows users to modify not only what is said (the text) but also how it's said: emotion, tone, speaking style, prosody, accent, even paralinguistic cues. Because the model is trained with a “large-margin learning” objective over many synthesized and natural speech samples, it gains robust control over expressive attributes, and can perform iterative editing: e.g. you could record a line, then ask the model to “make it sadder,” “speak slower,” or “change accent to X.”

Features

Token-based audio editing: converts speech to discrete token streams for high-level, language-like editing operations on audio
Dual-codebook tokenizer design: separates linguistic content and prosody/style — enabling control over both what is said and how it's said
Expressive editing: allows modifying emotion, tone, accent, speaking style, prosody, pacing, and other vocal attributes without re-recording
Iterative editing workflow: supports multiple rounds of edits — e.g. change style, then adjust emotion, then pace, etc.
Zero-shot TTS: generate speech directly from text + optional style/emotion instructions, in a controlled expressive voice
Open-source model & code under permissive license — enabling integration, customization, and use in research, creative workflows, or production

Programming Language

Python

Step-Audio-EditX download for Linux

SCREENSHOTS

DESCRIPTION

Features

Programming Language

Categories