@kin_npm/data-scientis-claude
v1.0.0
Published
Claude Code toolkit for data science and analytics teams — installs .claude/ into your project.
Maintainers
Readme
Data Scientist — Claude Code Toolkit
A reusable set of Claude Code agents and rules for data science and analytics teams.
Copy this toolkit into any new project and configure the CLAUDE.md to get Claude
working with your project's specific context from day one.
Tech Stack: Python, R, SQL · AWS / GCP / Azure · pandas, scikit-learn, dbt, FastAPI
What's in this toolkit
.claude/
├── CONTEXT.md ← project-level instructions loaded by Claude automatically
├── kin-coding-agent-instructions.md ← hard requirements applied to every code generation task
├── agents/
│ ├── code-reviewer.md ← structured code review for data science and analytics
│ ├── debugger.md ← root-cause debugging for pipelines, notebooks, and APIs
│ ├── doc-writer.md ← docstrings, notebook docs, SQL headers, and READMEs
│ └── security-checker.md ← pre-commit scan for secrets, PII, and compliance issues
└── rules/
├── python.md ← Python style, naming, DataFrames, SQL, error handling
└── security.md ← 7 security rules + pre-deploy checklist + incident responseHow to adopt this toolkit in a new project
1. Copy the .claude/ directory into your project root
cp -r /path/to/data_scientist/.claude /path/to/your-project/2. Add a CLAUDE.md at your project root
Create a CLAUDE.md at the root of the client project with the context specific to that engagement:
compliance scope, PII fields, data sources, cloud provider, known data quality issues, etc.
Claude loads this file automatically — the .claude/CONTEXT.md from the toolkit describes
the toolkit structure and applies to all projects as-is.
3. Add rules for any stack not covered
The rules files are ready to use as-is. If a project uses a specific framework
(e.g., dbt, Airflow, Spark) that isn't covered, add a new file to .claude/rules/.
Claude picks it up automatically — no other configuration needed.
4. Start using the agents
See the full guide below.
How to use the agents
Agents are markdown files that tell Claude how to approach a specific task — what to check, in what order, and how to format the response. You invoke them by naming them in your prompt. No special syntax required.
code-reviewer.md — Code Review
When to use it: before merging code, after finishing a transformation function or pipeline, or when you want a second opinion on your implementation.
What to pass: the file path, function name, or module to review.
# Review a pipeline module
claude "Use the code-reviewer agent to review src/pipelines/loan_scoring.py"
# Review a SQL query
claude "Use the code-reviewer agent to review queries/monthly_revenue.sql"
# Review with specific focus
claude "Use the code-reviewer agent to review src/features/credit_features.py, focusing on null handling"What to expect: an overall summary of code quality, followed by findings organized by severity (critical / important / suggestion), and at least one positive. Each finding includes the affected code, why it is a problem, and the suggested fix.
debugger.md — Debugging
When to use it: when you have a traceback, when a pipeline produces unexpected output, when a model gives wrong predictions, or when a notebook fails mid-run.
What to pass: the full traceback and/or the path of the file where the error occurs.
# Pass the traceback directly
claude "I have this error:
KeyError: 'loan_id'
at src/pipelines/transform.py:42 in build_features
Use the debugger agent to find the root cause."
# Point to the file
claude "Use the debugger agent to inspect src/models/predict.py — predictions look wrong on nulls"
# Debug a data issue
claude "Use the debugger agent for the monthly_revenue pipeline — output totals don't match source"What to expect: root cause in one sentence, evidence from the traceback or data, fix in code, explanation of why the fix works, and a prevention suggestion.
doc-writer.md — Documentation
When to use it: after finishing a function or pipeline, before sharing a notebook, or when you inherit undocumented code.
What to pass: the file path, function name, notebook path, or SQL file to document.
# Write docstrings for a module
claude "Use the doc-writer agent to write docstrings for all functions in src/features/credit_features.py"
# Document a notebook
claude "Use the doc-writer agent to document notebooks/eda_loan_portfolio.ipynb"
# Document a SQL query
claude "Use the doc-writer agent to add a header comment to queries/monthly_revenue.sql"
# Generate a README for a module
claude "Use the doc-writer agent to generate a README for the src/pipelines/ directory"What to expect: Google-style docstrings with Args, Returns, and Raises sections; notebook narrative structure with context and conclusion cells; SQL header comments with purpose, source tables, grain, and business rules.
security-checker.md — Security Audit
When to use it: before every commit to shared branches, when adding a new data source, or when a feature touches PII fields or external credentials.
What to pass: the file path or feature to audit.
# Security scan before commit
claude "Use the security-checker agent to audit src/ingestion/client_loader.py"
# Check for exposed secrets
claude "Use the security-checker agent to check if any credentials are hardcoded in src/"
# Pre-deployment audit
claude "Use the security-checker agent to run through the pre-deploy checklist for the scoring API"What to expect: findings organized by severity (critical → high → medium → low), with specific locations, impact descriptions, and secure alternatives.
Rules that are always active
The files in .claude/rules/ are loaded as standing instructions — Claude follows them
in every interaction without you needing to ask.
python.md— Python style, naming conventions, DataFrame best practices, SQL, error handlingsecurity.md— 7 security rules covering secrets, PII, version control, input validation, dependencies, cloud, and compliance — plus a pre-deploy checklist
These rules apply from the first message. You don't need to reference them explicitly.
Extending the toolkit
If your project needs conventions not covered here (dbt, Airflow, Spark, R pipelines, etc.),
add new files to .claude/rules/. Claude will pick them up automatically.
The development team owns these files — if a rule doesn't match your project's reality, update it. The toolkit should serve the project, not the other way around.
Common Workflows
Starting a new feature or analysis
claude "I need to build a credit scoring feature using loan application data. Help me plan this."
# Claude will create a plan following SRP, DRY, and project conventionsCode review before PR
claude "Use the code-reviewer agent to review src/pipelines/feature_engineering.py before I create a PR"Debugging a pipeline failure
claude "Getting: ValueError: cannot operate on empty DataFrame at src/pipelines/score.py:88. Use the debugger agent."Documenting a notebook before sharing with the client
claude "Use the doc-writer agent to structure and document notebooks/quarterly_forecast.ipynb"Pre-deployment security check
claude "Use the security-checker agent to run the pre-deploy checklist for the scoring API"Project Structure (reference)
your-project/
├── .claude/ ← Claude Code toolkit (copy from here)
│ ├── CONTEXT.md ← fill in project-specific context at the top
│ ├── kin-coding-agent-instructions.md
│ ├── agents/
│ │ ├── code-reviewer.md
│ │ ├── debugger.md
│ │ ├── doc-writer.md
│ │ └── security-checker.md
│ └── rules/
│ ├── python.md
│ └── security.md
├── src/
│ ├── pipelines/ ← ETL and ML pipelines
│ ├── features/ ← feature engineering
│ ├── models/ ← model training and inference
│ ├── api/ ← API endpoints (FastAPI / Flask)
│ └── utils/ ← shared utilities
├── tests/
│ ├── unit/
│ └── integration/
├── notebooks/ ← EDA and analysis (strip output before committing)
├── queries/ ← SQL queries and views
├── docs/ ← technical documentation
├── logs/ ← local execution logs (git-ignored)
├── data/
│ ├── raw/ ← git-ignored
│ └── processed/ ← git-ignored
├── .env.example ← environment variable template (no real values)
├── requirements.txt ← pinned dependencies
└── pyproject.toml ← project config (ruff, mypy, pytest, etc.)Tech Stack Details
Data & Analytics
- pandas / polars — data manipulation and transformation
- scikit-learn / XGBoost — machine learning models
- SQLAlchemy / psycopg2 — database access with parameterized queries
- dbt — SQL transformation framework (if applicable)
APIs & Services
- FastAPI / Flask — REST API endpoints for model serving
- Pydantic — input validation and schema enforcement
Quality & Observability
- pytest — unit and integration testing
- ruff — linting and formatting (replaces flake8 + black + isort)
- mypy — static type checking
- pip-audit — dependency vulnerability scanning
- structured logging — JSON logs with Correlation ID in every entry
Cloud & Infrastructure
- AWS / GCP / Azure — depends on project; IAM follows least privilege
- Secrets Manager — credentials never hardcoded
Quick Reference
| Task | Command |
|---|---|
| Code review | claude "Use code-reviewer agent to review [file]" |
| Debug error | claude "Use debugger agent: [traceback]" |
| Write docs | claude "Use doc-writer agent for [function/notebook/query]" |
| Security scan | claude "Use security-checker agent for [file/feature]" |
| Plan feature | claude "Help me plan implementing [feature]" |
| Explain code | claude "Explain how [file/pipeline] works" |
Questions or improvements
Have a question, hit an issue, or found a better pattern? Drop a message in the #claude-code-help Slack channel.
If you find a pattern or convention that's missing, add a new rule file to .claude/rules/
or a new agent to .claude/agents/ and share it in the channel.
Existing rules and agents must not be modified or removed — additions only.
License
Internal use only — adapt for your projects as needed.
