@kin_npm/data-scientis-claude

v1.0.0

Published

4 months ago

Claude Code toolkit for data science and analytics teams — installs .claude/ into your project.

Downloads

0High
0Medium
0Low

chri.kin

claude claude-code data-science toolkit kin

Data Scientist — Claude Code Toolkit

A reusable set of Claude Code agents and rules for data science and analytics teams. Copy this toolkit into any new project and configure the CLAUDE.md to get Claude working with your project's specific context from day one.

Tech Stack: Python, R, SQL · AWS / GCP / Azure · pandas, scikit-learn, dbt, FastAPI

What's in this toolkit

.claude/
├── CONTEXT.md                           ← project-level instructions loaded by Claude automatically
├── kin-coding-agent-instructions.md     ← hard requirements applied to every code generation task
├── agents/
│   ├── code-reviewer.md                 ← structured code review for data science and analytics
│   ├── debugger.md                      ← root-cause debugging for pipelines, notebooks, and APIs
│   ├── doc-writer.md                    ← docstrings, notebook docs, SQL headers, and READMEs
│   └── security-checker.md              ← pre-commit scan for secrets, PII, and compliance issues
└── rules/
    ├── python.md                        ← Python style, naming, DataFrames, SQL, error handling
    └── security.md                      ← 7 security rules + pre-deploy checklist + incident response

How to adopt this toolkit in a new project

1. Copy the `.claude/` directory into your project root

cp -r /path/to/data_scientist/.claude /path/to/your-project/

2. Add a `CLAUDE.md` at your project root

Create a CLAUDE.md at the root of the client project with the context specific to that engagement: compliance scope, PII fields, data sources, cloud provider, known data quality issues, etc. Claude loads this file automatically — the .claude/CONTEXT.md from the toolkit describes the toolkit structure and applies to all projects as-is.

3. Add rules for any stack not covered

The rules files are ready to use as-is. If a project uses a specific framework (e.g., dbt, Airflow, Spark) that isn't covered, add a new file to .claude/rules/. Claude picks it up automatically — no other configuration needed.

4. Start using the agents

See the full guide below.

How to use the agents

Agents are markdown files that tell Claude how to approach a specific task — what to check, in what order, and how to format the response. You invoke them by naming them in your prompt. No special syntax required.

`code-reviewer.md` — Code Review

When to use it: before merging code, after finishing a transformation function or pipeline, or when you want a second opinion on your implementation.

What to pass: the file path, function name, or module to review.

# Review a pipeline module
claude "Use the code-reviewer agent to review src/pipelines/loan_scoring.py"

# Review a SQL query
claude "Use the code-reviewer agent to review queries/monthly_revenue.sql"

# Review with specific focus
claude "Use the code-reviewer agent to review src/features/credit_features.py, focusing on null handling"

What to expect: an overall summary of code quality, followed by findings organized by severity (critical / important / suggestion), and at least one positive. Each finding includes the affected code, why it is a problem, and the suggested fix.

`debugger.md` — Debugging

When to use it: when you have a traceback, when a pipeline produces unexpected output, when a model gives wrong predictions, or when a notebook fails mid-run.

What to pass: the full traceback and/or the path of the file where the error occurs.

# Pass the traceback directly
claude "I have this error:
KeyError: 'loan_id'
  at src/pipelines/transform.py:42 in build_features
Use the debugger agent to find the root cause."

# Point to the file
claude "Use the debugger agent to inspect src/models/predict.py — predictions look wrong on nulls"

# Debug a data issue
claude "Use the debugger agent for the monthly_revenue pipeline — output totals don't match source"

What to expect: root cause in one sentence, evidence from the traceback or data, fix in code, explanation of why the fix works, and a prevention suggestion.

`doc-writer.md` — Documentation

When to use it: after finishing a function or pipeline, before sharing a notebook, or when you inherit undocumented code.

What to pass: the file path, function name, notebook path, or SQL file to document.

# Write docstrings for a module
claude "Use the doc-writer agent to write docstrings for all functions in src/features/credit_features.py"

# Document a notebook
claude "Use the doc-writer agent to document notebooks/eda_loan_portfolio.ipynb"

# Document a SQL query
claude "Use the doc-writer agent to add a header comment to queries/monthly_revenue.sql"

# Generate a README for a module
claude "Use the doc-writer agent to generate a README for the src/pipelines/ directory"

What to expect: Google-style docstrings with Args, Returns, and Raises sections; notebook narrative structure with context and conclusion cells; SQL header comments with purpose, source tables, grain, and business rules.

`security-checker.md` — Security Audit

When to use it: before every commit to shared branches, when adding a new data source, or when a feature touches PII fields or external credentials.

What to pass: the file path or feature to audit.

# Security scan before commit
claude "Use the security-checker agent to audit src/ingestion/client_loader.py"

# Check for exposed secrets
claude "Use the security-checker agent to check if any credentials are hardcoded in src/"

# Pre-deployment audit
claude "Use the security-checker agent to run through the pre-deploy checklist for the scoring API"

What to expect: findings organized by severity (critical → high → medium → low), with specific locations, impact descriptions, and secure alternatives.

Rules that are always active

The files in .claude/rules/ are loaded as standing instructions — Claude follows them in every interaction without you needing to ask.

python.md — Python style, naming conventions, DataFrame best practices, SQL, error handling
security.md — 7 security rules covering secrets, PII, version control, input validation, dependencies, cloud, and compliance — plus a pre-deploy checklist

These rules apply from the first message. You don't need to reference them explicitly.

Extending the toolkit

If your project needs conventions not covered here (dbt, Airflow, Spark, R pipelines, etc.), add new files to .claude/rules/. Claude will pick them up automatically.

The development team owns these files — if a rule doesn't match your project's reality, update it. The toolkit should serve the project, not the other way around.

Common Workflows

Starting a new feature or analysis

claude "I need to build a credit scoring feature using loan application data. Help me plan this."
# Claude will create a plan following SRP, DRY, and project conventions

Code review before PR

claude "Use the code-reviewer agent to review src/pipelines/feature_engineering.py before I create a PR"

Debugging a pipeline failure

claude "Getting: ValueError: cannot operate on empty DataFrame at src/pipelines/score.py:88. Use the debugger agent."

Documenting a notebook before sharing with the client

claude "Use the doc-writer agent to structure and document notebooks/quarterly_forecast.ipynb"

Pre-deployment security check

claude "Use the security-checker agent to run the pre-deploy checklist for the scoring API"

Project Structure (reference)

your-project/
├── .claude/                     ← Claude Code toolkit (copy from here)
│   ├── CONTEXT.md               ← fill in project-specific context at the top
│   ├── kin-coding-agent-instructions.md
│   ├── agents/
│   │   ├── code-reviewer.md
│   │   ├── debugger.md
│   │   ├── doc-writer.md
│   │   └── security-checker.md
│   └── rules/
│       ├── python.md
│       └── security.md
├── src/
│   ├── pipelines/               ← ETL and ML pipelines
│   ├── features/                ← feature engineering
│   ├── models/                  ← model training and inference
│   ├── api/                     ← API endpoints (FastAPI / Flask)
│   └── utils/                   ← shared utilities
├── tests/
│   ├── unit/
│   └── integration/
├── notebooks/                   ← EDA and analysis (strip output before committing)
├── queries/                     ← SQL queries and views
├── docs/                        ← technical documentation
├── logs/                        ← local execution logs (git-ignored)
├── data/
│   ├── raw/                     ← git-ignored
│   └── processed/               ← git-ignored
├── .env.example                 ← environment variable template (no real values)
├── requirements.txt             ← pinned dependencies
└── pyproject.toml               ← project config (ruff, mypy, pytest, etc.)

Tech Stack Details

Data & Analytics

pandas / polars — data manipulation and transformation
scikit-learn / XGBoost — machine learning models
SQLAlchemy / psycopg2 — database access with parameterized queries
dbt — SQL transformation framework (if applicable)

APIs & Services

FastAPI / Flask — REST API endpoints for model serving
Pydantic — input validation and schema enforcement

Quality & Observability

pytest — unit and integration testing
ruff — linting and formatting (replaces flake8 + black + isort)
mypy — static type checking
pip-audit — dependency vulnerability scanning
structured logging — JSON logs with Correlation ID in every entry

Cloud & Infrastructure

AWS / GCP / Azure — depends on project; IAM follows least privilege
Secrets Manager — credentials never hardcoded

Quick Reference

| Task | Command | |---|---| | Code review | claude "Use code-reviewer agent to review [file]" | | Debug error | claude "Use debugger agent: [traceback]" | | Write docs | claude "Use doc-writer agent for [function/notebook/query]" | | Security scan | claude "Use security-checker agent for [file/feature]" | | Plan feature | claude "Help me plan implementing [feature]" | | Explain code | claude "Explain how [file/pipeline] works" |

Questions or improvements

Have a question, hit an issue, or found a better pattern? Drop a message in the #claude-code-help Slack channel.

If you find a pattern or convention that's missing, add a new rule file to .claude/rules/ or a new agent to .claude/agents/ and share it in the channel. Existing rules and agents must not be modified or removed — additions only.

License

Internal use only — adapt for your projects as needed.