npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

lakecode

v0.1.12

Published

Databricks-native AI CLI agent

Downloads

1,281

Readme

lakecode

Databricks-native AI CLI agent. Talk to your lakehouse — query data, debug jobs, manage permissions, deploy assets — all from your terminal.

Built on the Claude Agent SDK, lakecode wraps every Databricks operation in a safety-classified MCP tool layer with confirmation gates, policy enforcement, and full audit trails.

Quick Start

# Install
npm install -g lakecode

# Authenticate
lakecode auth login --host https://your-workspace.cloud.databricks.com

# Start chatting
lakecode chat
❯ show me the top 10 tables by size in the analytics schema
❯ why did job 12345 fail last night?
❯ /prove main.analytics.daily_revenue
❯ /cost top --days 7

Features

  • Natural language SQL — ask questions, get results with fully-qualified Unity Catalog names
  • 18 deterministic workflows/debug, /prove, /audit, /cost, /uc and more
  • Safety-first — every tool call classified as READ_ONLY / WRITE_REMOTE / DESTRUCTIVE with confirmation gates
  • Context-aware — workspace profiler injects catalog metadata, warehouse info, and function signatures into every turn
  • Knowledge injection — 25 Databricks skills loaded on-demand via keyword matching from the AI Dev Kit
  • Session management--continue / --resume <id> to pick up where you left off
  • Mission Control — full-screen TUI dashboard for ops monitoring
  • Policy engine — YAML-based rules for compliance enforcement
  • Evidence packs — every workflow run produces a timestamped audit trail in ~/.lakecode/runs/

Installation

npm install -g lakecode

Requirements: Node.js >= 18, Databricks CLI installed and on PATH.

Authentication

lakecode uses the Databricks CLI's authentication under the hood. Set up once:

# OAuth browser flow (recommended)
lakecode auth login --host https://your-workspace.cloud.databricks.com

# Or use an existing profile from ~/.databrickscfg
lakecode chat --profile STAGING

# Check auth status
lakecode auth status

Supports OAuth (U2M), PAT tokens, and Azure/GCP service principal auth — anything the Databricks CLI supports.

CLI Commands

lakecode chat (default)

Interactive REPL session with the AI agent.

lakecode chat [options]

| Flag | Description | |------|-------------| | --profile <name> | Databricks config profile | | --target <name> | Bundle target (dev/staging/prod) | | --continue | Resume most recent session | | --resume <id> | Resume a specific session by ID | | -p, --prompt <text> | Send initial prompt on startup | | --approve <level> | Auto-approve: read | write | destructive | | --compliance | Enable compliance mode (policy deny overrides --approve) | | --dry-run | Show commands without executing | | --verbose | Show raw CLI commands and LLM traffic |

lakecode run <prompt>

Single-shot execution — run a prompt non-interactively and exit.

lakecode run "list all tables in main.analytics" --output json
lakecode run workflow debug_job --input params.json --output md

| Flag | Description | |------|-------------| | --output <format> | text | json | stream-json | md (default: text) | | --approve <level> | Auto-approve level (default: read) | | --session-id <id> | Session ID for multi-turn continuity | | --profile, --verbose | Same as chat |

lakecode mc

Mission Control — full-screen ops dashboard with job monitoring, alerts, and watch subscriptions.

lakecode mc --profile PROD

lakecode auth

Manage Databricks authentication.

lakecode auth status          # Show current auth
lakecode auth profiles        # List all profiles
lakecode auth login           # OAuth browser flow
lakecode auth logout          # Clear cached tokens

lakecode config

Manage lakecode configuration.

lakecode config init          # Interactive setup wizard
lakecode config show          # Print resolved config

Slash Commands

In chat mode, type / to see autocomplete suggestions.

Workspace & Navigation

| Command | Description | |---------|-------------| | /help | Show available commands | | /clear | Clear conversation history | | /context | Show current context window usage | | /exit | Exit the session |

Databricks Operations

| Command | Description | |---------|-------------| | /debug job <id> | Multi-step job failure diagnosis with root cause analysis | | /prove <table> | Data quality analysis — row counts, nulls, distributions, anomalies | | /audit jobs | Comprehensive job audit with risk assessment across all jobs | | /cost top | Top spend analysis by SKU, identity, and job | | /cost spike | Cost anomaly detection — find unexpected spending |

Unity Catalog Governance

| Command | Description | |---------|-------------| | /uc explain-access <principal> <object> | Privilege graph + effective access explanation | | /uc diff-grants <object> --to <desired.yml> | Diff current vs desired grants | | /uc apply-grants --plan <planId> | Execute a reviewed grant plan | | /uc export-grants <object> | Export current grants as canonical YAML |

Asset Bundle Management

| Command | Description | |---------|-------------| | /capture job <id> | Extract job config into a Databricks Asset Bundle | | /capture pipeline <id> | Extract pipeline config into a bundle | | /drift detect | Compare bundle definition vs live workspace state | | /deploy | Deploy bundle with preflight checks and verification |

Monitoring

| Command | Description | |---------|-------------| | /watch | Start watching a job/query/table on an interval | | /watch list | List active watch subscriptions | | /watch stop | Stop a watch subscription | | /runs prune | Clean up old evidence pack runs |

MCP Tools

lakecode exposes 7 built-in tools via the Model Context Protocol:

| Tool | Description | Safety | |------|-------------|--------| | list_catalogs | List Unity Catalog catalogs | READ_ONLY | | list_schemas | List schemas in a catalog | READ_ONLY | | list_tables | List tables in a schema | READ_ONLY | | describe_table | Column types, properties, storage info | READ_ONLY | | sql_execute | Run any SQL statement | Dynamic | | batch_sql | Execute multiple SQL statements or files | Dynamic | | databricks_cli | Run any Databricks CLI command or REST API call | Dynamic |

Dynamic tools are classified per-invocation based on the SQL statement or CLI command being run.

External MCP Server

lakecode can optionally connect to the official databricks-mcp-server for additional tools (dashboards, pipelines, clusters, etc.):

# ~/.lakecode/config.yml
external_mcp:
  enabled: true
  command: uvx
  args: ["databricks-mcp-server@latest"]

Safety Model

Every tool invocation is classified into one of three levels:

| Level | Examples | Behavior | |-------|----------|----------| | READ_ONLY | SELECT, SHOW, DESCRIBE, list, get | Auto-approved by default | | WRITE_REMOTE | CREATE TABLE, INSERT, GRANT, jobs create | Requires confirmation | | DESTRUCTIVE | DROP TABLE, DELETE, jobs delete, clusters delete | Requires explicit confirmation |

Confirmation Gates

Databricks  ⚠ WRITE_REMOTE — CREATE TABLE main.staging.dim_users ...
            Allow? (y/n/always)

Override with --approve:

  • --approve read — auto-approve READ_ONLY (default)
  • --approve write — auto-approve READ_ONLY + WRITE_REMOTE
  • --approve destructive — auto-approve everything (use with caution)

Policy Engine

Define organization-wide rules in ~/.lakecode/policy.yml:

rules:
  - name: no-production-drops
    match:
      tool: sql_execute
      statement: "DROP.*production\\."
    action: deny
    message: "Dropping production tables is not allowed"

  - name: require-where-on-delete
    match:
      tool: sql_execute
      statement: "^DELETE FROM(?!.*WHERE)"
    action: deny
    message: "DELETE without WHERE clause is not allowed"

Enable compliance mode to make policy denials non-overridable:

lakecode chat --compliance

Workflows

Workflows are deterministic multi-step pipelines that combine API calls, SQL queries, and LLM analysis. Each produces a timestamped evidence pack in ~/.lakecode/runs/.

| ID | Name | Steps | |----|------|-------| | debug_job | Debug Job | Fetch config → get runs → get output → get logs → LLM diagnosis | | prove_table | Prove Table | Describe → sample → stats → nulls → duplicates → LLM assessment | | audit_jobs | Audit Jobs | List jobs → get runs → check schedules → LLM risk analysis | | cost_top | Cost Top | Query billing → group by SKU/identity/job → LLM insights | | cost_spike | Cost Spike | Query billing history → detect anomalies → LLM explanation | | genai_cost_agent | GenAI Cost Agent | Multi-turn tool-use loop over 10 GenAI cost functions | | uc_explain_access | UC Explain Access | Build privilege graph → compute effective access → LLM summary | | uc_diff_grants | UC Diff Grants | Fetch current → load desired → compute diff → generate plan | | uc_apply_grants | UC Apply Grants | Load plan → snapshot before → execute → snapshot after | | capture_job | Capture Job | Fetch config → generate bundle YAML → write files | | capture_pipeline | Capture Pipeline | Fetch pipeline → generate bundle YAML → write files | | drift_detect | Drift Detect | Read bundle → fetch live → diff → report | | bundle_deploy | Bundle Deploy | Preflight checks → deploy → verify | | job_status | Job Status | Fetch job config → get recent runs | | job_logs | Run Logs | Fetch run output → LLM diagnosis | | run_job | Run Job | Trigger run → poll for completion | | deploy_file | Deploy File | Validate → import to workspace | | query_history | Query History | Resolve run → fetch SQL history |

Running workflows programmatically

# Via slash command
/debug job 12345

# Via CLI
lakecode run workflow debug_job --input '{"job_id": "12345"}' --output json

Configuration

lakecode reads config from (in order of precedence):

  1. CLI flags
  2. .lakecode/config.yml (project-level)
  3. ~/.lakecode/config.yml (global)

Full config reference

# ~/.lakecode/config.yml

databricks:
  profile: DEFAULT                    # Databricks CLI profile
  target: dev                         # Bundle target
  warehouse_id: abc123               # Default SQL warehouse
  default_catalog: main              # Default catalog
  default_schema: default            # Default schema

agent:
  max_turns: 50                      # Max agent loop iterations
  max_tokens_per_response: 32768     # Max tokens per LLM response
  temperature: 0                     # LLM temperature
  context_window_budget: 100000      # Budget for auto-compaction
  system_prompt_extra: ""            # Additional system prompt text

safety:
  auto_approve: []                   # Tool patterns to auto-approve
  require_confirm: []                # Tool patterns requiring confirmation
  blocked: []                        # Tool patterns blocked from execution

external_mcp:
  enabled: true                      # Enable databricks-mcp-server
  command: uvx
  args: ["databricks-mcp-server@latest"]

sessions:
  dir: ~/.lakecode/sessions          # Session storage
  retention_days: 30

runs:
  dir: ~/.lakecode/runs              # Evidence pack storage
  retention_days: 30

policy:
  global_path: ~/.lakecode/policy.yml
  compliance: false                  # Enable compliance mode

watch:
  default_interval_sec: 60
  max_subscriptions: 20
  subscriptions_path: ~/.lakecode/watch.yml

User Conventions

Add SQL and coding conventions that the agent will follow:

# Global conventions
echo "Always use UPPERCASE SQL keywords" > ~/.lakecode/conventions.md

# Project-level conventions
echo "Prefer CTEs over subqueries" > .lakecode/conventions.md

Architecture

src/
├── bin/             # CLI entry point
├── cli/
│   ├── commands/    # chat, run, mc, auth, config
│   └── ui/          # Ink (React) terminal components
├── config/          # Zod schema, config loader
├── context/         # Workspace profiler, skill router, conventions
├── mcp/             # MCP server, tool safety classification
├── prompts/         # System prompt construction
├── tools/           # Tool definitions and registry
├── uc/              # Unity Catalog governance (grants, diff, plans)
└── workflows/       # Workflow engine, 18 registered workflows

Knowledge Injection Pipeline

On each conversation turn, lakecode builds context:

  1. Workspace profiler — cached metadata (~200 tokens): catalogs, schemas, table counts, warehouses, functions
  2. Skill router — keyword-matches the user's message against 25 skills, loads top 1-2 per turn
  3. Skills library — 25 skills from the official Databricks AI Dev Kit
  4. User conventions — merged from global + project conventions files

Development

# Clone
git clone https://github.com/lakeside-analytics/lakecode.git
cd lakecode

# Install dependencies
npm install

# Run in development mode
npm run dev

# Run tests (674 tests across 41 files)
npm test

# Build
npm run build

# Type check
npx tsc --noEmit

Test Suite

41 test files, 674 tests

Key test areas:
- System prompt regression (48 tests) — behavioral contracts + snapshot
- MCP tool safety classification (110 tests) — SQL, CLI, REST patterns
- Workflow step execution (29 tests) — mocked CLI, real execute() logic
- Config schema validation, policy engine, watch system, UC governance
- Markdown rendering edge cases, session management, input sanitization

License

MIT