stemit-cli

v1.0.7

Published

2 months ago

CLI tool to split audio into stems (vocals/drums/bass/other), analyze BPM & key, and mute/solo tracks

Downloads

0High
0Medium
0Low

saravanaraja25

audio stem demucs music bpm key vocals split remix cli

stemit

Separate any song into its stems — vocals, drums, bass, guitar, piano, and more. Analyze BPM & musical key. Mute or solo any instrument. Works with YouTube URLs or local files.

      _                 _ _
  ___| |_ ___ _ __ ___ (_) |_
 / __| __/ _ \ '_ ` _ \| | __|
 \__ \ ||  __/ | | | | | | |_
 |___/\__\___|_| |_| |_|_|\__|
  stem separator · BPM · key · mix

stemit

What it does

stemit takes a song (from a YouTube URL or a local audio file) and:

Downloads it via yt-dlp if you pass a URL
Separates it into individual instrument stems using Demucs (Facebook Research's state-of-the-art source separation model)
Analyzes the BPM and musical key of the result
Mixes stems back together with specific instruments muted or soloed
Exports everything as WAV or MP3

You get back clean, studio-quality isolated tracks you can drop straight into your DAW.

Prerequisites

stemit requires three system-level dependencies. Everything else (including Demucs itself) is installed automatically on first run.

| Dependency | Minimum Version | Install | |---|---|---| | Node.js | 18+ | https://nodejs.org | | Python | 3.9+ | https://www.python.org/downloads/ | | ffmpeg | any | brew install ffmpeg / sudo apt install ffmpeg / winget install ffmpeg |

Note: Demucs (the AI model) and all Python dependencies are installed automatically into a private virtualenv at ~/.stemit/venv the first time you run any stemit command. You do not need to pip install anything yourself.

System requirements disclaimer: stemit depends on local CPU performance, OS audio tooling, network reliability (for URL downloads), and third-party binaries (ffmpeg, Python packages, model downloads). Performance and output quality can vary across machines and environments.
If you run into setup issues, command failures, or unexpected output, please open an issue with your OS, Node/Python versions, command used, and error logs so I can help quickly.

Installation

Using npx (no install required)

npx stemit-cli split "https://www.youtube.com/watch?v=..."

Global install

npm install -g stemit-cli
stemit split "https://www.youtube.com/watch?v=..."

Local install (in a project)

npm install stemit-cli
npx stemit split ./my-song.wav

Quick Start

# Split a YouTube video into stems
stemit split "https://www.youtube.com/watch?v=oTS0LLXaLF0"

# Split a local file
stemit split ./my-song.wav

# Get 6 stems (adds guitar + piano)
stemit split ./my-song.wav --model htdemucs_6s

# Produce an instrumental (no vocals)
stemit split ./my-song.wav --mute vocals

# Isolate just the drums
stemit split ./my-song.wav --solo drums

# Export everything as MP3
stemit split ./my-song.wav --format mp3

Usage

Basic split

stemit split <input> [options]

<input> can be:

A YouTube URL — stemit split "https://www.youtube.com/watch?v=..."
A local file path — stemit split ./song.wav or stemit split ./song.mp3

The command will:

Check dependencies (and auto-install Demucs if needed)
Download the audio if a URL was provided
Run Demucs stem separation with a live progress bar
Analyze BPM and key of the result
Print a summary box with all output file paths

6-stem split (guitar + piano)

The htdemucs_6s model separates 6 instruments instead of the default 4, adding guitar and piano tracks:

stemit split ./my-song.wav --model htdemucs_6s

Output stems: vocals, drums, bass, guitar, piano, other

Caveat: This model works best on Western pop/rock. For songs without guitar or piano those tracks will be mostly silent, and their content stays in other.

Mute a stem

Muting removes one instrument and mixes the rest back together. Useful for creating instrumentals or karaoke tracks.

# Remove vocals → instrumental
stemit split ./my-song.wav --mute vocals

# Remove drums → no-drums mix
stemit split ./my-song.wav --mute drums

Valid values: vocals, drums, bass, guitar, piano, other

The mixed output is saved as mute-<stem>.wav inside the song's stem folder, alongside the individual stem files.

Solo a stem

Soloing keeps only one instrument and discards the rest. No mixing needed — the file is simply copied.

# Isolate only vocals
stemit split ./my-song.wav --solo vocals

# Isolate only drums
stemit split ./my-song.wav --solo drums

Valid values: vocals, drums, bass, guitar, piano, other

The output is saved as solo-<stem>.wav inside the song's stem folder.

Export as MP3

By default all output files are WAV. Pass --format mp3 to convert everything to MP3 after splitting:

stemit split ./my-song.wav --format mp3

This converts all stem files (and the mixed output, if --mute or --solo was used) to MP3 using ffmpeg at variable bitrate (-q:a 0, highest quality).

Skip BPM/key analysis

BPM and key detection uses essentia.js (WASM) and adds a few seconds. Skip it with --no-analyze:

stemit split ./my-song.wav --no-analyze

All options

stemit split <input> [options]

Options:
  --model <name>   Demucs model to use (default: "htdemucs_ft")
                   htdemucs | htdemucs_ft | htdemucs_6s | mdx | mdx_extra | mdx_extra_q
  --out <dir>      Output directory (default: "./stemit-output")
  --mute <stem>    Mute one stem and mix the rest (vocals|drums|bass|guitar|piano|other)
  --solo <stem>    Keep only one stem (vocals|drums|bass|guitar|piano|other)
  --format <fmt>   Output format: wav or mp3 (default: "wav")
  --no-analyze     Skip BPM and key detection
  -h, --help       Show help
  -V, --version    Show version

Available Models

| Model | Stems | Notes | |---|---|---| | htdemucs_ft | vocals, drums, bass, other | Default. Fine-tuned hybrid transformer — best quality for 4 stems | | htdemucs | vocals, drums, bass, other | Slightly faster than _ft, marginally lower quality | | htdemucs_6s | vocals, drums, bass, guitar, piano, other | 6-stem model — adds guitar and piano separation | | mdx_extra | vocals, drums, bass, other | Alternative architecture, good for vocals | | mdx | vocals, drums, bass, other | Faster MDX variant | | mdx_extra_q | vocals, drums, bass, other | Quantized MDX — smallest model, fastest inference |

Which model should I use?

Default (htdemucs_ft) — best all-round quality, use this unless you have a reason not to
htdemucs_6s — when you specifically need guitar or piano isolated
mdx_extra — when vocals quality is the priority
mdx_extra_q — when speed matters more than quality (e.g. batch processing)

Speed note: Demucs runs on CPU by default. Processing time is roughly 2–5× the song length on modern hardware (e.g. a 4-minute song takes 8–20 minutes). The mdx_extra_q model is significantly faster.

Separation Accuracy

stemit uses Demucs — one of the highest-rated open-source source separation models, consistently ranking at the top of the Music Demixing Challenge leaderboards.

What to expect:

| Genre / Instrument | Typical Quality | |---|---| | Vocals (pop/rock) | Excellent — clean isolation with minimal bleed | | Drums | Very good — kick, snare, and cymbals well preserved | | Bass | Good — works best when bass is prominent in the mix | | Guitar / Piano (htdemucs_6s) | Moderate — depends heavily on how prominent the instrument is | | Electronic / heavily layered music | Lower — harder to separate tightly mixed synths | | Vocals (rap/spoken word) | Good — works well when vocals are dry or lightly processed |

Factors that affect quality:

Production style — heavily compressed or layered mixes are harder to separate
Frequency overlap — instruments sharing the same frequency range (e.g. bass guitar and kick drum) bleed into each other
Reverb / effects — wet, heavily reverbed sources are harder to isolate cleanly
Model choice — htdemucs_ft gives the best overall quality; mdx_extra is specifically tuned for vocals

Benchmark scores (SDR — Signal-to-Distortion Ratio, higher is better):

The htdemucs_ft model achieves approximately:

| Stem | SDR | |---|---| | Vocals | ~8.4 dB | | Drums | ~8.6 dB | | Bass | ~8.8 dB | | Other | ~5.8 dB |

SDR is a standard metric for source separation. Scores above 6 dB are considered good; above 8 dB is excellent. For reference, an SDR of 0 means the output is no better than silence.

These scores are competitive with commercial stem separation tools and are state-of-the-art for open-source models. Results on real-world music may vary.

Output Structure

stemit-output/
└── htdemucs_ft/
    └── My Song Title/
        ├── vocals.wav
        ├── drums.wav
        ├── bass.wav
        ├── other.wav
        ├── mute-vocals.wav     ← produced when --mute vocals is used
        └── solo-drums.wav      ← produced when --solo drums is used

All files are grouped under --out/<model>/<song-name>/ (default: ./stemit-output).

How It Works

┌─────────────┐    yt-dlp      ┌────────────┐
│  YouTube URL │ ─────────────▶ │  audio.wav │
└─────────────┘                └─────┬──────┘
                                     │
                 local file ─────────┘
                                     │
                                     ▼
                          ┌─────────────────────┐
                          │   Demucs (Python)    │
                          │   htdemucs_ft model  │
                          │   ~/.stemit/venv     │
                          └──────────┬──────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              ▼          ▼           ▼           ▼           ▼
         vocals.wav  drums.wav   bass.wav   guitar.wav  other.wav
              │
              ▼
   ┌─────────────────────┐
   │   essentia.js WASM  │
   │   BPM + Key detect  │
   └─────────────────────┘

  optional:
   stems ──▶ ffmpeg amix ──▶ mute-vocals.wav
   stems ──▶ ffmpeg copy ──▶ solo-drums.wav
   *.wav  ──▶ ffmpeg      ──▶ *.mp3

Tech stack:

| Component | Library | Notes | |---|---|---| | CLI framework | commander | Parses flags and subcommands | | YouTube download | youtube-dl-exec | Wraps yt-dlp, auto-downloads binary | | Stem separation | Demucs via child_process.spawn | Runs in managed Python venv | | Stem mixing | ffmpeg amix filter | normalize=0 to preserve volume | | BPM + Key | essentia.js | WASM, runs in Node | | Audio decode | audio-decode | Decodes WAV/MP3 to float32 for essentia | | WAV→MP3 | ffmpeg | -q:a 0 (VBR highest quality) | | Progress bars | cli-progress | Live chunk-by-chunk demucs progress | | Spinners | ora | Per-step feedback | | Summary panel | boxen + chalk | Formatted terminal output | | ASCII banner | figlet | Startup art |

First-Run Setup

The first time you run stemit split, it will:

Verify Python 3.9+ and ffmpeg are installed
Create a Python virtualenv at ~/.stemit/venv
Install Demucs and its dependencies into that venv (pip install demucs soundfile torchcodec)

This one-time setup takes 2–5 minutes depending on your internet connection (Demucs pulls in PyTorch, which is a large download). After that, the venv is reused on every run.

The AI model weights (~300 MB per model) are downloaded by Demucs on first use of each model and cached in ~/.cache/torch/hub/.

Troubleshooting

`demucs exited with code 1`

Run with a local file first to rule out download issues. The full Demucs error is printed below the message. Common causes:

Out of memory — Demucs needs ~4 GB RAM for htdemucs_ft. Try --model mdx_extra_q which is lighter.
Corrupt audio file — ensure the file plays correctly before passing it to stemit.
Python version conflict — stemit uses its own venv at ~/.stemit/venv, but if you see import errors, try deleting the venv and re-running: rm -rf ~/.stemit/venv

`ImportError: TorchCodec is required`

This happens with torchaudio >= 2.5 on a fresh install if torchcodec wasn't installed. stemit handles this automatically during setup, but if you hit it manually:

~/.stemit/venv/bin/pip install torchcodec

`ffmpeg not found`

Install ffmpeg for your platform:

macOS: brew install ffmpeg
Linux: sudo apt install ffmpeg
Windows: winget install ffmpeg

`python3 not found` / `python not found`

Install Python 3.9+ from https://www.python.org/downloads/ and ensure it's on your PATH.

On Windows, the installer adds python (not python3) to PATH — stemit checks both.

Progress bar cycles multiple times

This is normal. Demucs splits audio into overlapping chunks and processes them sequentially. Each chunk shows its own 0→100% progress. The label shows chunk N/M so you can track overall progress.

Slow processing

Demucs runs on CPU by default — this is expected. A 4-minute song takes 8–20 minutes on modern hardware. There is no GPU acceleration option in stemit currently (Demucs supports CUDA if you have an NVIDIA GPU and install the CUDA version of PyTorch manually into ~/.stemit/venv).

Resetting the venv

If something goes wrong with the Python environment, delete the venv and re-run:

rm -rf ~/.stemit/venv
stemit split ./any-file.wav

This re-creates the venv and re-installs Demucs from scratch.