stemit-cli
v1.0.7
Published
CLI tool to split audio into stems (vocals/drums/bass/other), analyze BPM & key, and mute/solo tracks
Downloads
790
Maintainers
Readme
stemit
Separate any song into its stems — vocals, drums, bass, guitar, piano, and more. Analyze BPM & musical key. Mute or solo any instrument. Works with YouTube URLs or local files.
_ _ _
___| |_ ___ _ __ ___ (_) |_
/ __| __/ _ \ '_ ` _ \| | __|
\__ \ || __/ | | | | | | |_
|___/\__\___|_| |_| |_|_|\__|
stem separator · BPM · key · mixTable of Contents
- stemit
What it does
stemit takes a song (from a YouTube URL or a local audio file) and:
- Downloads it via
yt-dlpif you pass a URL - Separates it into individual instrument stems using Demucs (Facebook Research's state-of-the-art source separation model)
- Analyzes the BPM and musical key of the result
- Mixes stems back together with specific instruments muted or soloed
- Exports everything as WAV or MP3
You get back clean, studio-quality isolated tracks you can drop straight into your DAW.
Prerequisites
stemit requires three system-level dependencies. Everything else (including Demucs itself) is installed automatically on first run.
| Dependency | Minimum Version | Install |
|---|---|---|
| Node.js | 18+ | https://nodejs.org |
| Python | 3.9+ | https://www.python.org/downloads/ |
| ffmpeg | any | brew install ffmpeg / sudo apt install ffmpeg / winget install ffmpeg |
Note: Demucs (the AI model) and all Python dependencies are installed automatically into a private virtualenv at
~/.stemit/venvthe first time you run anystemitcommand. You do not need topip installanything yourself.
System requirements disclaimer: stemit depends on local CPU performance, OS audio tooling, network reliability (for URL downloads), and third-party binaries (
ffmpeg, Python packages, model downloads). Performance and output quality can vary across machines and environments.If you run into setup issues, command failures, or unexpected output, please open an issue with your OS, Node/Python versions, command used, and error logs so I can help quickly.
Installation
Using npx (no install required)
npx stemit-cli split "https://www.youtube.com/watch?v=..."Global install
npm install -g stemit-cli
stemit split "https://www.youtube.com/watch?v=..."Local install (in a project)
npm install stemit-cli
npx stemit split ./my-song.wavQuick Start
# Split a YouTube video into stems
stemit split "https://www.youtube.com/watch?v=oTS0LLXaLF0"
# Split a local file
stemit split ./my-song.wav
# Get 6 stems (adds guitar + piano)
stemit split ./my-song.wav --model htdemucs_6s
# Produce an instrumental (no vocals)
stemit split ./my-song.wav --mute vocals
# Isolate just the drums
stemit split ./my-song.wav --solo drums
# Export everything as MP3
stemit split ./my-song.wav --format mp3Usage
Basic split
stemit split <input> [options]<input> can be:
- A YouTube URL —
stemit split "https://www.youtube.com/watch?v=..." - A local file path —
stemit split ./song.wavorstemit split ./song.mp3
The command will:
- Check dependencies (and auto-install Demucs if needed)
- Download the audio if a URL was provided
- Run Demucs stem separation with a live progress bar
- Analyze BPM and key of the result
- Print a summary box with all output file paths
6-stem split (guitar + piano)
The htdemucs_6s model separates 6 instruments instead of the default 4, adding guitar and piano tracks:
stemit split ./my-song.wav --model htdemucs_6sOutput stems: vocals, drums, bass, guitar, piano, other
Caveat: This model works best on Western pop/rock. For songs without guitar or piano those tracks will be mostly silent, and their content stays in
other.
Mute a stem
Muting removes one instrument and mixes the rest back together. Useful for creating instrumentals or karaoke tracks.
# Remove vocals → instrumental
stemit split ./my-song.wav --mute vocals
# Remove drums → no-drums mix
stemit split ./my-song.wav --mute drumsValid values: vocals, drums, bass, guitar, piano, other
The mixed output is saved as mute-<stem>.wav inside the song's stem folder, alongside the individual stem files.
Solo a stem
Soloing keeps only one instrument and discards the rest. No mixing needed — the file is simply copied.
# Isolate only vocals
stemit split ./my-song.wav --solo vocals
# Isolate only drums
stemit split ./my-song.wav --solo drumsValid values: vocals, drums, bass, guitar, piano, other
The output is saved as solo-<stem>.wav inside the song's stem folder.
Export as MP3
By default all output files are WAV. Pass --format mp3 to convert everything to MP3 after splitting:
stemit split ./my-song.wav --format mp3This converts all stem files (and the mixed output, if --mute or --solo was used) to MP3 using ffmpeg at variable bitrate (-q:a 0, highest quality).
Skip BPM/key analysis
BPM and key detection uses essentia.js (WASM) and adds a few seconds. Skip it with --no-analyze:
stemit split ./my-song.wav --no-analyzeAll options
stemit split <input> [options]
Options:
--model <name> Demucs model to use (default: "htdemucs_ft")
htdemucs | htdemucs_ft | htdemucs_6s | mdx | mdx_extra | mdx_extra_q
--out <dir> Output directory (default: "./stemit-output")
--mute <stem> Mute one stem and mix the rest (vocals|drums|bass|guitar|piano|other)
--solo <stem> Keep only one stem (vocals|drums|bass|guitar|piano|other)
--format <fmt> Output format: wav or mp3 (default: "wav")
--no-analyze Skip BPM and key detection
-h, --help Show help
-V, --version Show versionAvailable Models
| Model | Stems | Notes |
|---|---|---|
| htdemucs_ft | vocals, drums, bass, other | Default. Fine-tuned hybrid transformer — best quality for 4 stems |
| htdemucs | vocals, drums, bass, other | Slightly faster than _ft, marginally lower quality |
| htdemucs_6s | vocals, drums, bass, guitar, piano, other | 6-stem model — adds guitar and piano separation |
| mdx_extra | vocals, drums, bass, other | Alternative architecture, good for vocals |
| mdx | vocals, drums, bass, other | Faster MDX variant |
| mdx_extra_q | vocals, drums, bass, other | Quantized MDX — smallest model, fastest inference |
Which model should I use?
- Default (
htdemucs_ft) — best all-round quality, use this unless you have a reason not to htdemucs_6s— when you specifically need guitar or piano isolatedmdx_extra— when vocals quality is the prioritymdx_extra_q— when speed matters more than quality (e.g. batch processing)
Speed note: Demucs runs on CPU by default. Processing time is roughly 2–5× the song length on modern hardware (e.g. a 4-minute song takes 8–20 minutes). The mdx_extra_q model is significantly faster.
Separation Accuracy
stemit uses Demucs — one of the highest-rated open-source source separation models, consistently ranking at the top of the Music Demixing Challenge leaderboards.
What to expect:
| Genre / Instrument | Typical Quality |
|---|---|
| Vocals (pop/rock) | Excellent — clean isolation with minimal bleed |
| Drums | Very good — kick, snare, and cymbals well preserved |
| Bass | Good — works best when bass is prominent in the mix |
| Guitar / Piano (htdemucs_6s) | Moderate — depends heavily on how prominent the instrument is |
| Electronic / heavily layered music | Lower — harder to separate tightly mixed synths |
| Vocals (rap/spoken word) | Good — works well when vocals are dry or lightly processed |
Factors that affect quality:
- Production style — heavily compressed or layered mixes are harder to separate
- Frequency overlap — instruments sharing the same frequency range (e.g. bass guitar and kick drum) bleed into each other
- Reverb / effects — wet, heavily reverbed sources are harder to isolate cleanly
- Model choice —
htdemucs_ftgives the best overall quality;mdx_extrais specifically tuned for vocals
Benchmark scores (SDR — Signal-to-Distortion Ratio, higher is better):
The htdemucs_ft model achieves approximately:
| Stem | SDR | |---|---| | Vocals | ~8.4 dB | | Drums | ~8.6 dB | | Bass | ~8.8 dB | | Other | ~5.8 dB |
SDR is a standard metric for source separation. Scores above 6 dB are considered good; above 8 dB is excellent. For reference, an SDR of 0 means the output is no better than silence.
These scores are competitive with commercial stem separation tools and are state-of-the-art for open-source models. Results on real-world music may vary.
Output Structure
stemit-output/
└── htdemucs_ft/
└── My Song Title/
├── vocals.wav
├── drums.wav
├── bass.wav
├── other.wav
├── mute-vocals.wav ← produced when --mute vocals is used
└── solo-drums.wav ← produced when --solo drums is usedAll files are grouped under --out/<model>/<song-name>/ (default: ./stemit-output).
How It Works
┌─────────────┐ yt-dlp ┌────────────┐
│ YouTube URL │ ─────────────▶ │ audio.wav │
└─────────────┘ └─────┬──────┘
│
local file ─────────┘
│
▼
┌─────────────────────┐
│ Demucs (Python) │
│ htdemucs_ft model │
│ ~/.stemit/venv │
└──────────┬──────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼ ▼ ▼
vocals.wav drums.wav bass.wav guitar.wav other.wav
│
▼
┌─────────────────────┐
│ essentia.js WASM │
│ BPM + Key detect │
└─────────────────────┘
optional:
stems ──▶ ffmpeg amix ──▶ mute-vocals.wav
stems ──▶ ffmpeg copy ──▶ solo-drums.wav
*.wav ──▶ ffmpeg ──▶ *.mp3Tech stack:
| Component | Library | Notes |
|---|---|---|
| CLI framework | commander | Parses flags and subcommands |
| YouTube download | youtube-dl-exec | Wraps yt-dlp, auto-downloads binary |
| Stem separation | Demucs via child_process.spawn | Runs in managed Python venv |
| Stem mixing | ffmpeg amix filter | normalize=0 to preserve volume |
| BPM + Key | essentia.js | WASM, runs in Node |
| Audio decode | audio-decode | Decodes WAV/MP3 to float32 for essentia |
| WAV→MP3 | ffmpeg | -q:a 0 (VBR highest quality) |
| Progress bars | cli-progress | Live chunk-by-chunk demucs progress |
| Spinners | ora | Per-step feedback |
| Summary panel | boxen + chalk | Formatted terminal output |
| ASCII banner | figlet | Startup art |
First-Run Setup
The first time you run stemit split, it will:
- Verify Python 3.9+ and ffmpeg are installed
- Create a Python virtualenv at
~/.stemit/venv - Install Demucs and its dependencies into that venv (
pip install demucs soundfile torchcodec)
This one-time setup takes 2–5 minutes depending on your internet connection (Demucs pulls in PyTorch, which is a large download). After that, the venv is reused on every run.
The AI model weights (~300 MB per model) are downloaded by Demucs on first use of each model and cached in ~/.cache/torch/hub/.
Troubleshooting
demucs exited with code 1
Run with a local file first to rule out download issues. The full Demucs error is printed below the message. Common causes:
- Out of memory — Demucs needs ~4 GB RAM for
htdemucs_ft. Try--model mdx_extra_qwhich is lighter. - Corrupt audio file — ensure the file plays correctly before passing it to stemit.
- Python version conflict — stemit uses its own venv at
~/.stemit/venv, but if you see import errors, try deleting the venv and re-running:rm -rf ~/.stemit/venv
ImportError: TorchCodec is required
This happens with torchaudio >= 2.5 on a fresh install if torchcodec wasn't installed. stemit handles this automatically during setup, but if you hit it manually:
~/.stemit/venv/bin/pip install torchcodecffmpeg not found
Install ffmpeg for your platform:
- macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg - Windows:
winget install ffmpeg
python3 not found / python not found
Install Python 3.9+ from https://www.python.org/downloads/ and ensure it's on your PATH.
On Windows, the installer adds python (not python3) to PATH — stemit checks both.
Progress bar cycles multiple times
This is normal. Demucs splits audio into overlapping chunks and processes them sequentially. Each chunk shows its own 0→100% progress. The label shows chunk N/M so you can track overall progress.
Slow processing
Demucs runs on CPU by default — this is expected. A 4-minute song takes 8–20 minutes on modern hardware. There is no GPU acceleration option in stemit currently (Demucs supports CUDA if you have an NVIDIA GPU and install the CUDA version of PyTorch manually into ~/.stemit/venv).
Resetting the venv
If something goes wrong with the Python environment, delete the venv and re-run:
rm -rf ~/.stemit/venv
stemit split ./any-file.wavThis re-creates the venv and re-installs Demucs from scratch.
License
MIT © saravanaraja25
