Skip to content

🔊 Audio

Sound effects (AudioGen), background music & loops (MusicGen), and expressive NPC voice synthesis (Bark).

Quick Reference

Command Description Model
assgen gen audio sfx generate Generate a sound effect from a text description (AudioGen) audioldm2
assgen gen audio sfx edit Edit or process an existing sound effect
assgen gen audio sfx library Browse the local generated SFX library
assgen gen audio music compose Compose a music track from a text prompt (MusicGen) musicgen-large
assgen gen audio music loop Generate a seamlessly looping music track musicgen-stereo-medium
assgen gen audio music adaptive Generate adaptive music stems for different gameplay moods musicgen-stereo-large
assgen gen audio voice tts Convert text to speech with optional emotion (Bark) bark
assgen gen audio voice clone Clone a voice from an audio sample and synthesise new speech XTTS-v2
assgen gen audio voice dialog Generate a batch of voiced NPC dialog lines from a script file
assgen gen audio process normalize Normalize audio to a target LUFS level or peak 0 dBFS (algorithmic)
assgen gen audio process trim-silence Strip leading and trailing silence from an audio file
assgen gen audio process loop-optimize Find zero-crossing loop points for seamless audio looping
assgen gen audio process convert Convert audio between formats (WAV/OGG/MP3/FLAC) (algorithmic)
assgen gen audio process downmix Downmix stereo to mono (or upmix mono to stereo) (algorithmic)
assgen gen audio process resample Change the sample rate of an audio file (algorithmic)
assgen gen audio process waveform Generate a waveform PNG preview of an audio file (algorithmic)

assgen gen audio sfx generate

Generate a sound effect from a text description (AudioGen).

Examples: assgen gen audio sfx generate "laser gun firing, futuristic" --wait assgen gen audio sfx generate "heavy footsteps on gravel" -d 3.0 --wait assgen gen audio sfx generate "explosion, distant, muffled" -n 3 --wait assgen gen audio sfx generate "UI button click, satisfying" -d 0.5 --wait

AI Model

AudioLDM2 cvssp/audioldm2

AudioLDM2 — trained on general audio and sound effects via latent diffusion. Replaces facebook/audiogen-medium (removed from transformers 5.x; audiocraft broken on Python 3.13+). Requires diffusers.

Parameters

Parameter Type Default Description
PROMPT (required) TEXT Sound description, e.g. 'laser gun firing'
--duration FLOAT 2.0 Target duration in seconds
--variations INTEGER 1 Number of variants to generate
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress
--model-id TEXT Override HF model (validated by server)

Examples

assgen gen audio sfx generate "laser gun firing, futuristic" --wait
assgen gen audio sfx generate "heavy footsteps on gravel" -d 3.0 --wait
assgen gen audio sfx generate "explosion, distant, muffled" -n 3 --wait
assgen gen audio sfx generate "UI button click, satisfying" -d 0.5 --wait

assgen gen audio sfx edit

Edit or process an existing sound effect.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Input audio file to edit
--operation TEXT pitch pitch | reverb | speed | layer | normalize
--value TEXT Operation parameter
--secondary TEXT Second audio for layer op
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress

assgen gen audio sfx library

Browse the local generated SFX library.

Parameters

Parameter Type Default Description
QUERY TEXT Search query for local SFX library

assgen gen audio music compose

Compose a music track from a text prompt (MusicGen).

Examples: assgen gen audio music compose "epic orchestral battle music, dramatic" --wait assgen gen audio music compose "ambient forest, birds, peaceful" -d 30 --wait assgen gen audio music compose "upbeat 8-bit chiptune, adventure" --genre chiptune --wait assgen gen audio music compose "tense stealth theme, low bass" -d 60 --wait

AI Model

MusicGen Large facebook/musicgen-large

3.3B params (~6.6 GB fp16); best single-stem music quality that fits in 12 GB

Parameters

Parameter Type Default Description
PROMPT (required) TEXT Music description, e.g. 'epic orchestral battle theme'
--duration FLOAT 15.0 Track length in seconds
--bpm INTEGER Beats per minute
--key TEXT Musical key, e.g. 'C minor'
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress

Examples

assgen gen audio music compose "epic orchestral battle music, dramatic" --wait
assgen gen audio music compose "ambient forest, birds, peaceful" -d 30 --wait
assgen gen audio music compose "upbeat 8-bit chiptune, adventure" --genre chiptune --wait
assgen gen audio music compose "tense stealth theme, low bass" -d 60 --wait

assgen gen audio music loop

Generate a seamlessly looping music track.

AI Model

MusicGen Stereo Medium facebook/musicgen-stereo-medium

Stereo output essential for loop playback; 1.5B params; prompt for 'seamless loop'

Parameters

Parameter Type Default Description
PROMPT (required) TEXT Loop description, e.g. 'calm forest ambient loop'
--duration FLOAT 30.0 Loop duration in seconds
--variations INTEGER 1 Number of loop variants
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress

assgen gen audio music adaptive

Generate adaptive music stems for different gameplay moods.

AI Model

MusicGen Stereo Large facebook/musicgen-stereo-large

Best quality stereo; use continuation mode to generate adaptive stinger/transition layers

Parameters

Parameter Type Default Description
THEME (required) TEXT Base theme description
--moods TEXT calm,tense,combat,victory Comma-separated mood states to generate stems for
--duration FLOAT 30.0
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress

assgen gen audio voice tts

Convert text to speech with optional emotion (Bark).

Examples: assgen gen audio voice tts "Hello adventurer, welcome to my shop!" --wait assgen gen audio voice tts "I will have my revenge!" --preset v2/en_speaker_6 --wait assgen gen audio voice tts "The ancient tome speaks of dark prophecy..." --preset v2/en_speaker_9 --wait

AI Model

Bark suno/bark

Highly expressive TTS with non-verbal sounds (laughs, sighs); fits in 12 GB fp16

Parameters

Parameter Type Default Description
TEXT (required) TEXT Text to synthesise
--emotion TEXT Emotion tag: neutral angry happy sad fearful
--speaker TEXT Speaker preset, e.g. 'v2/en_speaker_6'
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress

Examples

assgen gen audio voice tts "Hello adventurer, welcome to my shop!" --wait
assgen gen audio voice tts "I will have my revenge!" --preset v2/en_speaker_6 --wait
assgen gen audio voice tts "The ancient tome speaks of dark prophecy..." --preset v2/en_speaker_9 --wait

assgen gen audio voice clone

Clone a voice from an audio sample and synthesise new speech.

AI Model

XTTS-v2 coqui-ai/XTTS-v2

Coqui XTTS-v2 — voice cloning from a 6-second reference clip; 17 languages; actively maintained; better voice quality for game character voices than OpenVoice. Requires coqui-tts package.

Parameters

Parameter Type Default Description
SAMPLE (required) TEXT Path to reference audio sample (≥5 seconds)
TEXT (required) TEXT Text for the cloned voice to speak
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes and stream live progress

assgen gen audio voice dialog

Generate a batch of voiced NPC dialog lines from a script file.

Parameters

Parameter Type Default Description
SCRIPT_FILE (required) TEXT JSON or plain-text file with dialog lines
--speaker TEXT Speaker preset or voice sample
--emotion TEXT
--output-dir TEXT Directory for output files
--wait BOOLEAN Block until the job completes and stream live progress

assgen gen audio process normalize

Normalize audio to a target LUFS level or peak 0 dBFS.

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to normalize
--lufs FLOAT -14.0 Target LUFS level
--mode TEXT lufs Normalization mode: lufs | peak
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes

assgen gen audio process trim-silence

Strip leading and trailing silence from an audio file.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to trim
--threshold-db FLOAT -50.0 Silence threshold in dBFS (default -50)
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes

assgen gen audio process loop-optimize

Find zero-crossing loop points for seamless audio looping.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to optimize for looping
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes

assgen gen audio process convert

Convert audio between formats (WAV/OGG/MP3/FLAC).

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to convert
--format TEXT ogg Target format: wav ogg mp3 flac
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes

assgen gen audio process downmix

Downmix stereo to mono (or upmix mono to stereo).

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to downmix
--channels INTEGER 1 Target channel count: 1 (mono) or 2 (stereo)
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes

assgen gen audio process resample

Change the sample rate of an audio file.

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to resample
--rate INTEGER 48000 Target sample rate in Hz
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes

assgen gen audio process waveform

Generate a waveform PNG preview of an audio file.

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter Type Default Description
INPUT_FILE (required) TEXT Audio file to visualize
--width INTEGER 1200 Output image width in pixels
--height INTEGER 200 Output image height in pixels
--color TEXT #00ff88 Waveform colour (hex, e.g. '#00ff88')
--output TEXT Output file or directory path
--wait BOOLEAN Block until the job completes