🔊 Audio¶

Sound effects (AudioGen), background music & loops (MusicGen), and expressive NPC voice synthesis (Bark).

Quick Reference¶

Command	Description	Model
`assgen gen audio sfx generate`	Generate a sound effect from a text description (AudioGen)	audioldm2
`assgen gen audio sfx edit`	Edit or process an existing sound effect	—
`assgen gen audio sfx library`	Browse the local generated SFX library	—
`assgen gen audio music compose`	Compose a music track from a text prompt (MusicGen)	musicgen-large
`assgen gen audio music loop`	Generate a seamlessly looping music track	musicgen-stereo-medium
`assgen gen audio music adaptive`	Generate adaptive music stems for different gameplay moods	musicgen-stereo-large
`assgen gen audio voice tts`	Convert text to speech with optional emotion (Bark)	bark
`assgen gen audio voice clone`	Clone a voice from an audio sample and synthesise new speech	XTTS-v2
`assgen gen audio voice dialog`	Generate a batch of voiced NPC dialog lines from a script file	—
`assgen gen audio process normalize`	Normalize audio to a target LUFS level or peak 0 dBFS	(algorithmic)
`assgen gen audio process trim-silence`	Strip leading and trailing silence from an audio file	—
`assgen gen audio process loop-optimize`	Find zero-crossing loop points for seamless audio looping	—
`assgen gen audio process convert`	Convert audio between formats (WAV/OGG/MP3/FLAC)	(algorithmic)
`assgen gen audio process downmix`	Downmix stereo to mono (or upmix mono to stereo)	(algorithmic)
`assgen gen audio process resample`	Change the sample rate of an audio file	(algorithmic)
`assgen gen audio process waveform`	Generate a waveform PNG preview of an audio file	(algorithmic)

`assgen gen audio sfx generate`¶

Generate a sound effect from a text description (AudioGen).

Examples: assgen gen audio sfx generate "laser gun firing, futuristic" --wait assgen gen audio sfx generate "heavy footsteps on gravel" -d 3.0 --wait assgen gen audio sfx generate "explosion, distant, muffled" -n 3 --wait assgen gen audio sfx generate "UI button click, satisfying" -d 0.5 --wait

AI Model

AudioLDM2 cvssp/audioldm2

AudioLDM2 — trained on general audio and sound effects via latent diffusion. Replaces facebook/audiogen-medium (removed from transformers 5.x; audiocraft broken on Python 3.13+). Requires diffusers.

Parameters

Parameter	Type	Default	Description
`PROMPT` (required)	`TEXT`	`—`	Sound description, e.g. 'laser gun firing'
`--duration`	`FLOAT`	`2.0`	Target duration in seconds
`--variations`	`INTEGER`	`1`	Number of variants to generate
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress
`--model-id`	`TEXT`	`—`	Override HF model (validated by server)

Examples

assgen gen audio sfx generate "laser gun firing, futuristic" --wait
assgen gen audio sfx generate "heavy footsteps on gravel" -d 3.0 --wait
assgen gen audio sfx generate "explosion, distant, muffled" -n 3 --wait
assgen gen audio sfx generate "UI button click, satisfying" -d 0.5 --wait

`assgen gen audio sfx edit`¶

Edit or process an existing sound effect.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Input audio file to edit
`--operation`	`TEXT`	`pitch`	pitch \| reverb \| speed \| layer \| normalize
`--value`	`TEXT`	`—`	Operation parameter
`--secondary`	`TEXT`	`—`	Second audio for layer op
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

`assgen gen audio sfx library`¶

Browse the local generated SFX library.

Parameters

Parameter	Type	Default	Description
`QUERY`	`TEXT`	`—`	Search query for local SFX library

`assgen gen audio music compose`¶

Compose a music track from a text prompt (MusicGen).

Examples: assgen gen audio music compose "epic orchestral battle music, dramatic" --wait assgen gen audio music compose "ambient forest, birds, peaceful" -d 30 --wait assgen gen audio music compose "upbeat 8-bit chiptune, adventure" --genre chiptune --wait assgen gen audio music compose "tense stealth theme, low bass" -d 60 --wait

AI Model

MusicGen Large facebook/musicgen-large

3.3B params (~6.6 GB fp16); best single-stem music quality that fits in 12 GB

Parameters

Parameter	Type	Default	Description
`PROMPT` (required)	`TEXT`	`—`	Music description, e.g. 'epic orchestral battle theme'
`--duration`	`FLOAT`	`15.0`	Track length in seconds
`--bpm`	`INTEGER`	`—`	Beats per minute
`--key`	`TEXT`	`—`	Musical key, e.g. 'C minor'
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

Examples

assgen gen audio music compose "epic orchestral battle music, dramatic" --wait
assgen gen audio music compose "ambient forest, birds, peaceful" -d 30 --wait
assgen gen audio music compose "upbeat 8-bit chiptune, adventure" --genre chiptune --wait
assgen gen audio music compose "tense stealth theme, low bass" -d 60 --wait

`assgen gen audio music loop`¶

Generate a seamlessly looping music track.

AI Model

MusicGen Stereo Medium facebook/musicgen-stereo-medium

Stereo output essential for loop playback; 1.5B params; prompt for 'seamless loop'

Parameters

Parameter	Type	Default	Description
`PROMPT` (required)	`TEXT`	`—`	Loop description, e.g. 'calm forest ambient loop'
`--duration`	`FLOAT`	`30.0`	Loop duration in seconds
`--variations`	`INTEGER`	`1`	Number of loop variants
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

`assgen gen audio music adaptive`¶

Generate adaptive music stems for different gameplay moods.

AI Model

MusicGen Stereo Large facebook/musicgen-stereo-large

Best quality stereo; use continuation mode to generate adaptive stinger/transition layers

Parameters

Parameter	Type	Default	Description
`THEME` (required)	`TEXT`	`—`	Base theme description
`--moods`	`TEXT`	`calm,tense,combat,victory`	Comma-separated mood states to generate stems for
`--duration`	`FLOAT`	`30.0`
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

`assgen gen audio voice tts`¶

Convert text to speech with optional emotion (Bark).

Examples: assgen gen audio voice tts "Hello adventurer, welcome to my shop!" --wait assgen gen audio voice tts "I will have my revenge!" --preset v2/en_speaker_6 --wait assgen gen audio voice tts "The ancient tome speaks of dark prophecy..." --preset v2/en_speaker_9 --wait

AI Model

Bark suno/bark

Highly expressive TTS with non-verbal sounds (laughs, sighs); fits in 12 GB fp16

Parameters

Parameter	Type	Default	Description
`TEXT` (required)	`TEXT`	`—`	Text to synthesise
`--emotion`	`TEXT`	`—`	Emotion tag: neutral angry happy sad fearful
`--speaker`	`TEXT`	`—`	Speaker preset, e.g. 'v2/en_speaker_6'
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

Examples

assgen gen audio voice tts "Hello adventurer, welcome to my shop!" --wait
assgen gen audio voice tts "I will have my revenge!" --preset v2/en_speaker_6 --wait
assgen gen audio voice tts "The ancient tome speaks of dark prophecy..." --preset v2/en_speaker_9 --wait

`assgen gen audio voice clone`¶

Clone a voice from an audio sample and synthesise new speech.

AI Model

XTTS-v2 coqui-ai/XTTS-v2

Coqui XTTS-v2 — voice cloning from a 6-second reference clip; 17 languages; actively maintained; better voice quality for game character voices than OpenVoice. Requires coqui-tts package.

Parameters

Parameter	Type	Default	Description
`SAMPLE` (required)	`TEXT`	`—`	Path to reference audio sample (≥5 seconds)
`TEXT` (required)	`TEXT`	`—`	Text for the cloned voice to speak
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

`assgen gen audio voice dialog`¶

Generate a batch of voiced NPC dialog lines from a script file.

Parameters

Parameter	Type	Default	Description
`SCRIPT_FILE` (required)	`TEXT`	`—`	JSON or plain-text file with dialog lines
`--speaker`	`TEXT`	`—`	Speaker preset or voice sample
`--emotion`	`TEXT`	`—`
`--output-dir`	`TEXT`	`—`	Directory for output files
`--wait`	`BOOLEAN`	`—`	Block until the job completes and stream live progress

`assgen gen audio process normalize`¶

Normalize audio to a target LUFS level or peak 0 dBFS.

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to normalize
`--lufs`	`FLOAT`	`-14.0`	Target LUFS level
`--mode`	`TEXT`	`lufs`	Normalization mode: lufs \| peak
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

`assgen gen audio process trim-silence`¶

Strip leading and trailing silence from an audio file.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to trim
`--threshold-db`	`FLOAT`	`-50.0`	Silence threshold in dBFS (default -50)
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

`assgen gen audio process loop-optimize`¶

Find zero-crossing loop points for seamless audio looping.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to optimize for looping
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

`assgen gen audio process convert`¶

Convert audio between formats (WAV/OGG/MP3/FLAC).

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to convert
`--format`	`TEXT`	`ogg`	Target format: wav ogg mp3 flac
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

`assgen gen audio process downmix`¶

Downmix stereo to mono (or upmix mono to stereo).

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to downmix
`--channels`	`INTEGER`	`1`	Target channel count: 1 (mono) or 2 (stereo)
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

`assgen gen audio process resample`¶

Change the sample rate of an audio file.

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to resample
`--rate`	`INTEGER`	`48000`	Target sample rate in Hz
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

`assgen gen audio process waveform`¶

Generate a waveform PNG preview of an audio file.

Algorithmic — no AI model required

This command uses CPU-based algorithms. No model download or GPU required.

Parameters

Parameter	Type	Default	Description
`INPUT_FILE` (required)	`TEXT`	`—`	Audio file to visualize
`--width`	`INTEGER`	`1200`	Output image width in pixels
`--height`	`INTEGER`	`200`	Output image height in pixels
`--color`	`TEXT`	`#00ff88`	Waveform colour (hex, e.g. '#00ff88')
`--output`	`TEXT`	`—`	Output file or directory path
`--wait`	`BOOLEAN`	`—`	Block until the job completes

🔊 Audio¶

Quick Reference¶

assgen gen audio sfx generate¶

assgen gen audio sfx edit¶

assgen gen audio sfx library¶

assgen gen audio music compose¶

assgen gen audio music loop¶

assgen gen audio music adaptive¶

assgen gen audio voice tts¶

assgen gen audio voice clone¶

assgen gen audio voice dialog¶

assgen gen audio process normalize¶

assgen gen audio process trim-silence¶

assgen gen audio process loop-optimize¶

assgen gen audio process convert¶

assgen gen audio process downmix¶

assgen gen audio process resample¶

assgen gen audio process waveform¶

`assgen gen audio sfx generate`¶

`assgen gen audio sfx edit`¶

`assgen gen audio sfx library`¶

`assgen gen audio music compose`¶

`assgen gen audio music loop`¶

`assgen gen audio music adaptive`¶

`assgen gen audio voice tts`¶

`assgen gen audio voice clone`¶

`assgen gen audio voice dialog`¶

`assgen gen audio process normalize`¶

`assgen gen audio process trim-silence`¶

`assgen gen audio process loop-optimize`¶

`assgen gen audio process convert`¶

`assgen gen audio process downmix`¶

`assgen gen audio process resample`¶

`assgen gen audio process waveform`¶