npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

vidilearn

v1.1.0

Published

Production-grade global content extraction agent for YouTube and Web

Downloads

337

Readme

Vidilearn

Teach your AI using YouTube videos and the web

Production-grade content extraction agent for YouTube and the web. Extract transcripts, clean articles, and structured metadata locally — zero API keys, with automatic Playwright fallback for dynamic sites.

npm version npm downloads CI License: MIT

InstallationQuick StartFeaturesCLI UsageMCP ServerAI Workflows


Overview

Vidilearn is a modern developer-first CLI and MCP server designed for AI agents, RAG pipelines, automation systems, Codex CLI workflows, Gemini CLI integrations, and educational tooling.

Extract structured knowledge from YouTube videos and web articles directly into your AI systems.

No API keys. No bloated setup. No vendor lock-in.


Features

YouTube

  • Transcript extraction
  • Subtitle download with multi-language support (--lang, --list-langs)
  • Chapter & timestamp extraction
  • Description and metadata parsing
  • Batch playlist extraction
  • Live/premiere video detection
  • Streaming transcript output for long videos

Web

  • Clean article extraction from static pages
  • Automatic Playwright fallback for JS-rendered / dynamic sites
  • Same structured JSON schema as YouTube output

AI-native

  • Native MCP server mode — expose extraction as tools, not just CLI output
  • Local embedding generation via @xenova/transformers — no external embedding API needed
  • AI-ready structured JSON output across every command

Engineering

  • Zero API keys required
  • Lightweight by default — Playwright loads lazily, only when the fallback path is triggered
  • Automation-friendly, scriptable, pipeable

Installation

Global Installation

npm install -g vidilearn

Verify Installation

vidilearn --help

Quick Start

Extract a YouTube video

vidilearn extract "https://youtube.com/watch?v=VIDEO_ID"

Extract a web article

vidilearn extract "https://example.com/some-article"

Vidilearn auto-detects YouTube vs. general web URLs and routes to the correct extractor.

Rule of thumb: Whenever a URL contains ?, &, or =, wrap it in quotes to avoid shell interpretation issues.


CLI Usage

YouTube extraction

vidilearn extract "<youtube-url>"                # full extraction
vidilearn extract "<youtube-url>" --pretty        # pretty-printed JSON
vidilearn extract "<youtube-url>" --transcript    # transcript only
vidilearn extract "<youtube-url>" --chapters      # chapters only
vidilearn extract "<youtube-url>" --metadata      # metadata only
vidilearn extract "<youtube-url>" --stream        # stream transcript as it's parsed

Subtitle language control

vidilearn extract "<youtube-url>" --list-langs        # list available subtitle languages
vidilearn extract "<youtube-url>" --lang es            # extract Spanish subtitles

Batch playlist extraction

vidilearn extract-playlist "<playlist-url>"
vidilearn extract-playlist "<playlist-url>" --concurrency 5
vidilearn extract-playlist "<playlist-url>" --output-dir ./videos

Web article extraction

vidilearn extract "<article-url>"

Static pages are parsed directly. If the page returns little to no usable content (typical of JS-heavy sites), vidilearn automatically retries using a headless Playwright browser.

Local embeddings

vidilearn extract "<url>" --embed

Outputs { chunk, embedding } pairs generated locally — ready for ingestion into a vector store, no API key required.

Save output

vidilearn extract "<url>" > output.json

Example JSON Output

YouTube

{
  "title": "Build AI Agents",
  "channel": "AI Academy",
  "duration": "12:45",
  "description": "Learn how to build AI agents...",
  "transcript": "...",
  "chapters": [
    { "title": "Introduction", "timestamp": "00:00" },
    { "title": "Agent Architecture", "timestamp": "03:42" }
  ]
}

Web article

{
  "title": "Understanding Transformer Architectures",
  "source_url": "https://example.com/transformers",
  "byline": "Jane Doe",
  "published_date": "2026-04-02",
  "clean_text": "...",
  "word_count": 1840
}

MCP Server Mode

Run vidilearn as a native MCP server so agents can call extraction directly as a tool, instead of shelling out to the CLI.

vidilearn mcp-server

Exposes extract_youtube and extract_web as MCP tools over stdio transport — compatible with Claude, Gemini CLI, and any MCP-compatible agent framework.


AI & Agent Workflows

Vidilearn is designed for modern AI ecosystems.

Compatible with:

  • MCP Servers (native)
  • Claude Workflows
  • Gemini CLI
  • Codex CLI
  • OpenAI Agents
  • LangChain / LangGraph
  • CrewAI / AutoGen
  • RAG pipelines
  • Vector databases
  • Local AI systems

Use cases:

  • RAG pipelines — convert long-form videos and articles into searchable knowledge bases
  • AI memory systems — store extracted knowledge into persistent agent memory
  • Educational applications — turn lectures and tutorials into structured AI-readable datasets
  • Autonomous agents — let agents learn directly from YouTube and the web via the MCP server
  • Research systems — extract technical insights from conferences, talks, and long-form articles

Why Vidilearn?

Most extraction tools require API keys, are bloated, break frequently, or aren't built with AI agents in mind.

Vidilearn focuses on:

  • Developer experience
  • AI-native workflows (CLI and MCP)
  • Structured, consistent outputs across content types
  • Clean CLI ergonomics
  • Production-ready automation

Roadmap

  • Vector database integrations (Pinecone, Weaviate, Qdrant adapters)
  • AI summarization modules
  • Live stream partial-transcript extraction
  • Browser extension companion

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup steps and the PR checklist.


Security

Vidilearn does not require API keys or authentication tokens. Always review extracted content before using it in production AI systems.


License

MIT — see LICENSE.


Support

If Vidilearn helps your workflow, consider sponsoring development ❤️

GitHub Sponsors: https://github.com/sponsors/sarathi-eng


Author

Built by Alfo Tech Industries

© 2026 Alfo Tech Industries. All rights reserved.