lakecode
v0.1.12
Published
Databricks-native AI CLI agent
Downloads
1,281
Maintainers
Readme
lakecode
Databricks-native AI CLI agent. Talk to your lakehouse — query data, debug jobs, manage permissions, deploy assets — all from your terminal.
Built on the Claude Agent SDK, lakecode wraps every Databricks operation in a safety-classified MCP tool layer with confirmation gates, policy enforcement, and full audit trails.
Quick Start
# Install
npm install -g lakecode
# Authenticate
lakecode auth login --host https://your-workspace.cloud.databricks.com
# Start chatting
lakecode chat❯ show me the top 10 tables by size in the analytics schema
❯ why did job 12345 fail last night?
❯ /prove main.analytics.daily_revenue
❯ /cost top --days 7Features
- Natural language SQL — ask questions, get results with fully-qualified Unity Catalog names
- 18 deterministic workflows —
/debug,/prove,/audit,/cost,/ucand more - Safety-first — every tool call classified as READ_ONLY / WRITE_REMOTE / DESTRUCTIVE with confirmation gates
- Context-aware — workspace profiler injects catalog metadata, warehouse info, and function signatures into every turn
- Knowledge injection — 25 Databricks skills loaded on-demand via keyword matching from the AI Dev Kit
- Session management —
--continue/--resume <id>to pick up where you left off - Mission Control — full-screen TUI dashboard for ops monitoring
- Policy engine — YAML-based rules for compliance enforcement
- Evidence packs — every workflow run produces a timestamped audit trail in
~/.lakecode/runs/
Installation
npm install -g lakecodeRequirements: Node.js >= 18, Databricks CLI installed and on PATH.
Authentication
lakecode uses the Databricks CLI's authentication under the hood. Set up once:
# OAuth browser flow (recommended)
lakecode auth login --host https://your-workspace.cloud.databricks.com
# Or use an existing profile from ~/.databrickscfg
lakecode chat --profile STAGING
# Check auth status
lakecode auth statusSupports OAuth (U2M), PAT tokens, and Azure/GCP service principal auth — anything the Databricks CLI supports.
CLI Commands
lakecode chat (default)
Interactive REPL session with the AI agent.
lakecode chat [options]| Flag | Description |
|------|-------------|
| --profile <name> | Databricks config profile |
| --target <name> | Bundle target (dev/staging/prod) |
| --continue | Resume most recent session |
| --resume <id> | Resume a specific session by ID |
| -p, --prompt <text> | Send initial prompt on startup |
| --approve <level> | Auto-approve: read | write | destructive |
| --compliance | Enable compliance mode (policy deny overrides --approve) |
| --dry-run | Show commands without executing |
| --verbose | Show raw CLI commands and LLM traffic |
lakecode run <prompt>
Single-shot execution — run a prompt non-interactively and exit.
lakecode run "list all tables in main.analytics" --output json
lakecode run workflow debug_job --input params.json --output md| Flag | Description |
|------|-------------|
| --output <format> | text | json | stream-json | md (default: text) |
| --approve <level> | Auto-approve level (default: read) |
| --session-id <id> | Session ID for multi-turn continuity |
| --profile, --verbose | Same as chat |
lakecode mc
Mission Control — full-screen ops dashboard with job monitoring, alerts, and watch subscriptions.
lakecode mc --profile PRODlakecode auth
Manage Databricks authentication.
lakecode auth status # Show current auth
lakecode auth profiles # List all profiles
lakecode auth login # OAuth browser flow
lakecode auth logout # Clear cached tokenslakecode config
Manage lakecode configuration.
lakecode config init # Interactive setup wizard
lakecode config show # Print resolved configSlash Commands
In chat mode, type / to see autocomplete suggestions.
Workspace & Navigation
| Command | Description |
|---------|-------------|
| /help | Show available commands |
| /clear | Clear conversation history |
| /context | Show current context window usage |
| /exit | Exit the session |
Databricks Operations
| Command | Description |
|---------|-------------|
| /debug job <id> | Multi-step job failure diagnosis with root cause analysis |
| /prove <table> | Data quality analysis — row counts, nulls, distributions, anomalies |
| /audit jobs | Comprehensive job audit with risk assessment across all jobs |
| /cost top | Top spend analysis by SKU, identity, and job |
| /cost spike | Cost anomaly detection — find unexpected spending |
Unity Catalog Governance
| Command | Description |
|---------|-------------|
| /uc explain-access <principal> <object> | Privilege graph + effective access explanation |
| /uc diff-grants <object> --to <desired.yml> | Diff current vs desired grants |
| /uc apply-grants --plan <planId> | Execute a reviewed grant plan |
| /uc export-grants <object> | Export current grants as canonical YAML |
Asset Bundle Management
| Command | Description |
|---------|-------------|
| /capture job <id> | Extract job config into a Databricks Asset Bundle |
| /capture pipeline <id> | Extract pipeline config into a bundle |
| /drift detect | Compare bundle definition vs live workspace state |
| /deploy | Deploy bundle with preflight checks and verification |
Monitoring
| Command | Description |
|---------|-------------|
| /watch | Start watching a job/query/table on an interval |
| /watch list | List active watch subscriptions |
| /watch stop | Stop a watch subscription |
| /runs prune | Clean up old evidence pack runs |
MCP Tools
lakecode exposes 7 built-in tools via the Model Context Protocol:
| Tool | Description | Safety |
|------|-------------|--------|
| list_catalogs | List Unity Catalog catalogs | READ_ONLY |
| list_schemas | List schemas in a catalog | READ_ONLY |
| list_tables | List tables in a schema | READ_ONLY |
| describe_table | Column types, properties, storage info | READ_ONLY |
| sql_execute | Run any SQL statement | Dynamic |
| batch_sql | Execute multiple SQL statements or files | Dynamic |
| databricks_cli | Run any Databricks CLI command or REST API call | Dynamic |
Dynamic tools are classified per-invocation based on the SQL statement or CLI command being run.
External MCP Server
lakecode can optionally connect to the official databricks-mcp-server for additional tools (dashboards, pipelines, clusters, etc.):
# ~/.lakecode/config.yml
external_mcp:
enabled: true
command: uvx
args: ["databricks-mcp-server@latest"]Safety Model
Every tool invocation is classified into one of three levels:
| Level | Examples | Behavior | |-------|----------|----------| | READ_ONLY | SELECT, SHOW, DESCRIBE, list, get | Auto-approved by default | | WRITE_REMOTE | CREATE TABLE, INSERT, GRANT, jobs create | Requires confirmation | | DESTRUCTIVE | DROP TABLE, DELETE, jobs delete, clusters delete | Requires explicit confirmation |
Confirmation Gates
Databricks ⚠ WRITE_REMOTE — CREATE TABLE main.staging.dim_users ...
Allow? (y/n/always)Override with --approve:
--approve read— auto-approve READ_ONLY (default)--approve write— auto-approve READ_ONLY + WRITE_REMOTE--approve destructive— auto-approve everything (use with caution)
Policy Engine
Define organization-wide rules in ~/.lakecode/policy.yml:
rules:
- name: no-production-drops
match:
tool: sql_execute
statement: "DROP.*production\\."
action: deny
message: "Dropping production tables is not allowed"
- name: require-where-on-delete
match:
tool: sql_execute
statement: "^DELETE FROM(?!.*WHERE)"
action: deny
message: "DELETE without WHERE clause is not allowed"Enable compliance mode to make policy denials non-overridable:
lakecode chat --complianceWorkflows
Workflows are deterministic multi-step pipelines that combine API calls, SQL queries, and LLM analysis. Each produces a timestamped evidence pack in ~/.lakecode/runs/.
| ID | Name | Steps |
|----|------|-------|
| debug_job | Debug Job | Fetch config → get runs → get output → get logs → LLM diagnosis |
| prove_table | Prove Table | Describe → sample → stats → nulls → duplicates → LLM assessment |
| audit_jobs | Audit Jobs | List jobs → get runs → check schedules → LLM risk analysis |
| cost_top | Cost Top | Query billing → group by SKU/identity/job → LLM insights |
| cost_spike | Cost Spike | Query billing history → detect anomalies → LLM explanation |
| genai_cost_agent | GenAI Cost Agent | Multi-turn tool-use loop over 10 GenAI cost functions |
| uc_explain_access | UC Explain Access | Build privilege graph → compute effective access → LLM summary |
| uc_diff_grants | UC Diff Grants | Fetch current → load desired → compute diff → generate plan |
| uc_apply_grants | UC Apply Grants | Load plan → snapshot before → execute → snapshot after |
| capture_job | Capture Job | Fetch config → generate bundle YAML → write files |
| capture_pipeline | Capture Pipeline | Fetch pipeline → generate bundle YAML → write files |
| drift_detect | Drift Detect | Read bundle → fetch live → diff → report |
| bundle_deploy | Bundle Deploy | Preflight checks → deploy → verify |
| job_status | Job Status | Fetch job config → get recent runs |
| job_logs | Run Logs | Fetch run output → LLM diagnosis |
| run_job | Run Job | Trigger run → poll for completion |
| deploy_file | Deploy File | Validate → import to workspace |
| query_history | Query History | Resolve run → fetch SQL history |
Running workflows programmatically
# Via slash command
/debug job 12345
# Via CLI
lakecode run workflow debug_job --input '{"job_id": "12345"}' --output jsonConfiguration
lakecode reads config from (in order of precedence):
- CLI flags
.lakecode/config.yml(project-level)~/.lakecode/config.yml(global)
Full config reference
# ~/.lakecode/config.yml
databricks:
profile: DEFAULT # Databricks CLI profile
target: dev # Bundle target
warehouse_id: abc123 # Default SQL warehouse
default_catalog: main # Default catalog
default_schema: default # Default schema
agent:
max_turns: 50 # Max agent loop iterations
max_tokens_per_response: 32768 # Max tokens per LLM response
temperature: 0 # LLM temperature
context_window_budget: 100000 # Budget for auto-compaction
system_prompt_extra: "" # Additional system prompt text
safety:
auto_approve: [] # Tool patterns to auto-approve
require_confirm: [] # Tool patterns requiring confirmation
blocked: [] # Tool patterns blocked from execution
external_mcp:
enabled: true # Enable databricks-mcp-server
command: uvx
args: ["databricks-mcp-server@latest"]
sessions:
dir: ~/.lakecode/sessions # Session storage
retention_days: 30
runs:
dir: ~/.lakecode/runs # Evidence pack storage
retention_days: 30
policy:
global_path: ~/.lakecode/policy.yml
compliance: false # Enable compliance mode
watch:
default_interval_sec: 60
max_subscriptions: 20
subscriptions_path: ~/.lakecode/watch.ymlUser Conventions
Add SQL and coding conventions that the agent will follow:
# Global conventions
echo "Always use UPPERCASE SQL keywords" > ~/.lakecode/conventions.md
# Project-level conventions
echo "Prefer CTEs over subqueries" > .lakecode/conventions.mdArchitecture
src/
├── bin/ # CLI entry point
├── cli/
│ ├── commands/ # chat, run, mc, auth, config
│ └── ui/ # Ink (React) terminal components
├── config/ # Zod schema, config loader
├── context/ # Workspace profiler, skill router, conventions
├── mcp/ # MCP server, tool safety classification
├── prompts/ # System prompt construction
├── tools/ # Tool definitions and registry
├── uc/ # Unity Catalog governance (grants, diff, plans)
└── workflows/ # Workflow engine, 18 registered workflowsKnowledge Injection Pipeline
On each conversation turn, lakecode builds context:
- Workspace profiler — cached metadata (~200 tokens): catalogs, schemas, table counts, warehouses, functions
- Skill router — keyword-matches the user's message against 25 skills, loads top 1-2 per turn
- Skills library — 25 skills from the official Databricks AI Dev Kit
- User conventions — merged from global + project conventions files
Development
# Clone
git clone https://github.com/lakeside-analytics/lakecode.git
cd lakecode
# Install dependencies
npm install
# Run in development mode
npm run dev
# Run tests (674 tests across 41 files)
npm test
# Build
npm run build
# Type check
npx tsc --noEmitTest Suite
41 test files, 674 tests
Key test areas:
- System prompt regression (48 tests) — behavioral contracts + snapshot
- MCP tool safety classification (110 tests) — SQL, CLI, REST patterns
- Workflow step execution (29 tests) — mocked CLI, real execute() logic
- Config schema validation, policy engine, watch system, UC governance
- Markdown rendering edge cases, session management, input sanitization