🔊 Audio¶
Sound effects (AudioGen), background music & loops (MusicGen), and expressive NPC voice synthesis (Bark).
Quick Reference¶
| Command | Description | Model |
|---|---|---|
assgen gen audio sfx generate |
Generate a sound effect from a text description (AudioGen) | audioldm2 |
assgen gen audio sfx edit |
Edit or process an existing sound effect | — |
assgen gen audio sfx library |
Browse the local generated SFX library | — |
assgen gen audio music compose |
Compose a music track from a text prompt (MusicGen) | musicgen-large |
assgen gen audio music loop |
Generate a seamlessly looping music track | musicgen-stereo-medium |
assgen gen audio music adaptive |
Generate adaptive music stems for different gameplay moods | musicgen-stereo-large |
assgen gen audio voice tts |
Convert text to speech with optional emotion (Bark) | bark |
assgen gen audio voice clone |
Clone a voice from an audio sample and synthesise new speech | XTTS-v2 |
assgen gen audio voice dialog |
Generate a batch of voiced NPC dialog lines from a script file | — |
assgen gen audio process normalize |
Normalize audio to a target LUFS level or peak 0 dBFS | (algorithmic) |
assgen gen audio process trim-silence |
Strip leading and trailing silence from an audio file | — |
assgen gen audio process loop-optimize |
Find zero-crossing loop points for seamless audio looping | — |
assgen gen audio process convert |
Convert audio between formats (WAV/OGG/MP3/FLAC) | (algorithmic) |
assgen gen audio process downmix |
Downmix stereo to mono (or upmix mono to stereo) | (algorithmic) |
assgen gen audio process resample |
Change the sample rate of an audio file | (algorithmic) |
assgen gen audio process waveform |
Generate a waveform PNG preview of an audio file | (algorithmic) |
assgen gen audio sfx generate¶
Generate a sound effect from a text description (AudioGen).
Examples: assgen gen audio sfx generate "laser gun firing, futuristic" --wait assgen gen audio sfx generate "heavy footsteps on gravel" -d 3.0 --wait assgen gen audio sfx generate "explosion, distant, muffled" -n 3 --wait assgen gen audio sfx generate "UI button click, satisfying" -d 0.5 --wait
AI Model
AudioLDM2
cvssp/audioldm2
AudioLDM2 — trained on general audio and sound effects via latent diffusion. Replaces facebook/audiogen-medium (removed from transformers 5.x; audiocraft broken on Python 3.13+). Requires diffusers.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
PROMPT (required) |
TEXT |
— |
Sound description, e.g. 'laser gun firing' |
--duration |
FLOAT |
2.0 |
Target duration in seconds |
--variations |
INTEGER |
1 |
Number of variants to generate |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
--model-id |
TEXT |
— |
Override HF model (validated by server) |
Examples
assgen gen audio sfx generate "laser gun firing, futuristic" --wait
assgen gen audio sfx generate "heavy footsteps on gravel" -d 3.0 --wait
assgen gen audio sfx generate "explosion, distant, muffled" -n 3 --wait
assgen gen audio sfx generate "UI button click, satisfying" -d 0.5 --wait
assgen gen audio sfx edit¶
Edit or process an existing sound effect.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Input audio file to edit |
--operation |
TEXT |
pitch |
pitch | reverb | speed | layer | normalize |
--value |
TEXT |
— |
Operation parameter |
--secondary |
TEXT |
— |
Second audio for layer op |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
assgen gen audio sfx library¶
Browse the local generated SFX library.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
QUERY |
TEXT |
— |
Search query for local SFX library |
assgen gen audio music compose¶
Compose a music track from a text prompt (MusicGen).
Examples: assgen gen audio music compose "epic orchestral battle music, dramatic" --wait assgen gen audio music compose "ambient forest, birds, peaceful" -d 30 --wait assgen gen audio music compose "upbeat 8-bit chiptune, adventure" --genre chiptune --wait assgen gen audio music compose "tense stealth theme, low bass" -d 60 --wait
AI Model
MusicGen Large
facebook/musicgen-large
3.3B params (~6.6 GB fp16); best single-stem music quality that fits in 12 GB
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
PROMPT (required) |
TEXT |
— |
Music description, e.g. 'epic orchestral battle theme' |
--duration |
FLOAT |
15.0 |
Track length in seconds |
--bpm |
INTEGER |
— |
Beats per minute |
--key |
TEXT |
— |
Musical key, e.g. 'C minor' |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
Examples
assgen gen audio music compose "epic orchestral battle music, dramatic" --wait
assgen gen audio music compose "ambient forest, birds, peaceful" -d 30 --wait
assgen gen audio music compose "upbeat 8-bit chiptune, adventure" --genre chiptune --wait
assgen gen audio music compose "tense stealth theme, low bass" -d 60 --wait
assgen gen audio music loop¶
Generate a seamlessly looping music track.
AI Model
MusicGen Stereo Medium
facebook/musicgen-stereo-medium
Stereo output essential for loop playback; 1.5B params; prompt for 'seamless loop'
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
PROMPT (required) |
TEXT |
— |
Loop description, e.g. 'calm forest ambient loop' |
--duration |
FLOAT |
30.0 |
Loop duration in seconds |
--variations |
INTEGER |
1 |
Number of loop variants |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
assgen gen audio music adaptive¶
Generate adaptive music stems for different gameplay moods.
AI Model
MusicGen Stereo Large
facebook/musicgen-stereo-large
Best quality stereo; use continuation mode to generate adaptive stinger/transition layers
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
THEME (required) |
TEXT |
— |
Base theme description |
--moods |
TEXT |
calm,tense,combat,victory |
Comma-separated mood states to generate stems for |
--duration |
FLOAT |
30.0 |
|
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
assgen gen audio voice tts¶
Convert text to speech with optional emotion (Bark).
Examples: assgen gen audio voice tts "Hello adventurer, welcome to my shop!" --wait assgen gen audio voice tts "I will have my revenge!" --preset v2/en_speaker_6 --wait assgen gen audio voice tts "The ancient tome speaks of dark prophecy..." --preset v2/en_speaker_9 --wait
AI Model
Bark
suno/bark
Highly expressive TTS with non-verbal sounds (laughs, sighs); fits in 12 GB fp16
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
TEXT (required) |
TEXT |
— |
Text to synthesise |
--emotion |
TEXT |
— |
Emotion tag: neutral angry happy sad fearful |
--speaker |
TEXT |
— |
Speaker preset, e.g. 'v2/en_speaker_6' |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
Examples
assgen gen audio voice tts "Hello adventurer, welcome to my shop!" --wait
assgen gen audio voice tts "I will have my revenge!" --preset v2/en_speaker_6 --wait
assgen gen audio voice tts "The ancient tome speaks of dark prophecy..." --preset v2/en_speaker_9 --wait
assgen gen audio voice clone¶
Clone a voice from an audio sample and synthesise new speech.
AI Model
XTTS-v2
coqui-ai/XTTS-v2
Coqui XTTS-v2 — voice cloning from a 6-second reference clip; 17 languages; actively maintained; better voice quality for game character voices than OpenVoice. Requires coqui-tts package.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
SAMPLE (required) |
TEXT |
— |
Path to reference audio sample (≥5 seconds) |
TEXT (required) |
TEXT |
— |
Text for the cloned voice to speak |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
assgen gen audio voice dialog¶
Generate a batch of voiced NPC dialog lines from a script file.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
SCRIPT_FILE (required) |
TEXT |
— |
JSON or plain-text file with dialog lines |
--speaker |
TEXT |
— |
Speaker preset or voice sample |
--emotion |
TEXT |
— |
|
--output-dir |
TEXT |
— |
Directory for output files |
--wait |
BOOLEAN |
— |
Block until the job completes and stream live progress |
assgen gen audio process normalize¶
Normalize audio to a target LUFS level or peak 0 dBFS.
Algorithmic — no AI model required
This command uses CPU-based algorithms. No model download or GPU required.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to normalize |
--lufs |
FLOAT |
-14.0 |
Target LUFS level |
--mode |
TEXT |
lufs |
Normalization mode: lufs | peak |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |
assgen gen audio process trim-silence¶
Strip leading and trailing silence from an audio file.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to trim |
--threshold-db |
FLOAT |
-50.0 |
Silence threshold in dBFS (default -50) |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |
assgen gen audio process loop-optimize¶
Find zero-crossing loop points for seamless audio looping.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to optimize for looping |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |
assgen gen audio process convert¶
Convert audio between formats (WAV/OGG/MP3/FLAC).
Algorithmic — no AI model required
This command uses CPU-based algorithms. No model download or GPU required.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to convert |
--format |
TEXT |
ogg |
Target format: wav ogg mp3 flac |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |
assgen gen audio process downmix¶
Downmix stereo to mono (or upmix mono to stereo).
Algorithmic — no AI model required
This command uses CPU-based algorithms. No model download or GPU required.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to downmix |
--channels |
INTEGER |
1 |
Target channel count: 1 (mono) or 2 (stereo) |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |
assgen gen audio process resample¶
Change the sample rate of an audio file.
Algorithmic — no AI model required
This command uses CPU-based algorithms. No model download or GPU required.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to resample |
--rate |
INTEGER |
48000 |
Target sample rate in Hz |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |
assgen gen audio process waveform¶
Generate a waveform PNG preview of an audio file.
Algorithmic — no AI model required
This command uses CPU-based algorithms. No model download or GPU required.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
INPUT_FILE (required) |
TEXT |
— |
Audio file to visualize |
--width |
INTEGER |
1200 |
Output image width in pixels |
--height |
INTEGER |
200 |
Output image height in pixels |
--color |
TEXT |
#00ff88 |
Waveform colour (hex, e.g. '#00ff88') |
--output |
TEXT |
— |
Output file or directory path |
--wait |
BOOLEAN |
— |
Block until the job completes |