@hamak/smart-data-dico
v1.12.2
Published
Collaborative data dictionary management system — model, document, and govern your data landscape
Maintainers
Readme
Smart Data Dictionary
A collaborative data dictionary management system for modeling, documenting, and governing your organization's data landscape. Built on a microkernel plugin architecture for modularity, with file-based persistence (YAML/Git) designed for both technical and non-technical users.
Core Concepts
Packages
Packages organize entities into logical groups reflecting your system architecture (microservices, modules, bounded contexts). They are hierarchical -- packages can contain sub-packages, forming a tree that mirrors your application structure.
- Navigable via nested URLs:
/packages/eShop/ordering/entities/Order - Each package has a dashboard with entity counts, relationship counts, and metadata
- Package names follow kebab-case convention
Entities & Attributes
Entities are the primary modeling unit -- they represent data structures (tables, documents, events, value objects). Each entity has:
- Attributes with types (string, number, integer, boolean, datetime, enum, object, array), constraints (min/max length, pattern, precision), and metadata
- Nested attributes for object and array types (e.g.
shippingAddress.postalCode) - Status lifecycle: draft -> submitted -> approved (or returned)
- UUID-based identity for stable cross-references
Relationships
Relationships connect entities across packages. They are stored at the package level and define:
- Source and target entity (by UUID) with cardinality (one/many)
- Navigation property names for path traversal
- Metadata on the relationship itself
Stereotypes
Stereotypes define metadata schemas that apply to specific element types (entity, attribute, or package). When assigned, they auto-populate required and optional metadata fields, ensuring consistency.
Predefined stereotypes:
| Stereotype | Applies To | Required Fields | |-----------|-----------|-----------------| | Aggregate Root | Entity | bounded-context | | Reference Data | Entity | cacheable (flag) | | Domain Event | Entity | event-source | | Value Object | Entity | immutable (flag) | | PII | Attribute | pii-category | | Indexed | Attribute | - | | Deprecated | Attribute | deprecated-since |
Metadata types: string, number, boolean, date, flag (semantic boolean rendered as toggle), rule (text + severity: info/warning/error).
Perspectives
A Perspective captures a business view over a subset of the data model scoped to a business process (e.g. "Ordering", "Billing"). It is the core differentiating concept of the application.
How it works:
- Select root entities -- the starting points of the business process
- The system performs BFS traversal through relationships to discover all reachable entities
- Each entity is identified by its relationship path from root (e.g.
Order/orderItems/product) - The same entity reached via different paths is treated as separate usages
Path grammar -- uniform / navigation for all levels:
Order -- root entity
Order/totalAmount -- attribute on root
Order/orderItems -- relationship -> OrderItem
Order/orderItems/quantity -- attribute on OrderItem
Order/orderItems/product/sku -- deep chain -> attribute
Order/shippingAddress/postalCode -- attribute via relationship
Order/billingAddress/postalCode -- same attribute, different pathPerspectiveNode annotations on any path:
traverse: false-- frontier node (include but stop BFS here)exclude: true-- prune this path entirelymetadata-- path-scoped annotations (not overrides -- new metadata specific to this perspective usage)
Review Workflow
Entities go through a status lifecycle for team governance:
draft --> submitted --> approved
\-> returned (with comments) --> submitted- Comments can target specific attributes (e.g. "Missing validation rule on discountRate")
- Status transitions are role-gated (editors submit, admins approve/return)
- Comments are stored as sidecar YAML files per entity
Quality Metrics
The quality dashboard tracks documentation completeness:
- Description coverage -- % of entities/attributes with descriptions
- Metadata compliance -- % of entities with correct stereotype metadata
- Relationship coverage -- % of entities with at least one relationship
- Per-entity scoring -- weighted composite (30% description, 30% attribute descriptions, 20% relationships, 20% stereotype compliance)
- Drill-down from package level to individual entity gaps
User Journeys
Journey 1: Model a New Domain (Data Architect)
Alice documents the data model for an e-commerce platform.
- Create packages --
eShop > Ordering,Catalog,Billingvia the package dashboard - Define entities --
Order,OrderLine,Customerwith typed attributes and constraints - Define relationships --
OrdertoOrderLine(one-to-many) at the package level - Assign stereotypes -- Mark
Orderasaggregate-root, auto-populatingbounded-contextmetadata - Add metadata -- Flag
Customer.emailas PII, add business rules - Visualize -- Open the Cytoscape.js interactive graph, verify the model, save the layout
- Save & publish -- Save (commit to workspace), publish (push to shared)
Journey 2: Define a Perspective (Business Analyst)
Bob captures the billing-specific view of the data model.
- Create perspective -- "Billing" with
InvoiceandPaymentas root entities - Review resolved paths -- BFS walks relationships, discovers
Order,Customer,LineItem - Set frontiers -- Mark
Productas frontier (traverse: false) -- include but don't follow further - Exclude paths -- Exclude
Order/audit(not relevant to billing) - Add annotations -- On
Order/billingAddress/postalCode, add validation rule "Must match payment country" - Visualize -- Graph overlay highlights perspective members, dims others
Journey 3: Review & Approve (Data Steward)
Carol reviews changes submitted by the team.
- See pending entities -- Order is in
submittedstatus - Add comments -- "Missing description on id field" targeting specific attribute
- Return for revision -- Status changes to
returned, author gets notification - Approve -- After fixes, approve. Status moves to
approved - Check quality -- Quality dashboard shows 90% for order-service, up from 80%
Journey 4: Explore & Discover (Developer)
Dave needs to understand the data model for implementation.
- Browse -- Navigate
/packages/order-serviceto see the dashboard with stats - Search -- Search "order" returns 27 results across entities, attributes, metadata, relationships, packages
- Filter -- Faceted search: filter by type=relationship, service=order-service, stereotype=aggregate-root
- Impact analysis -- Check "Impact" tab on Order: 2 relationships, 1 perspective reference
- Export -- Export order-service as JSON Schema for code generation, or Markdown for documentation
Journey 5: Version & Collaborate (Team Lead)
Eve manages versioning aligned with application releases.
- Create workspace -- Creates
workspace/billing-v2(git branch with friendly name) - Monitor status -- Navbar shows "MAIN 8" (branch + unsaved count), dropdown shows ahead/behind
- Save & publish -- Save page shows changed files, commit with message, push to remote
- Switch workspaces -- Workspaces page lists branches, switch with one click
- Merge -- Merge page selects source workspace, previews diff, merges
Architecture
Monorepo
smart-data-dico/
backend/ Express + TypeScript (controllers > services > models)
frontend/ React 18 + Vite + TypeScript + Tailwind CSS + DaisyUI
data-dictionaries/ YAML entity files, relationships, perspectives, stereotypes
Dockerfile Multi-stage production build
docker-compose.yml One-command deploymentMicrokernel Plugin Architecture
The frontend uses @hamak/app-framework with 11 plugins:
| Plugin | Responsibility |
|--------|---------------|
| store | Redux state management (10 domain slices) |
| shell | Layout, theming (synced with DaisyUI) |
| auth | JWT authentication with Auth0 |
| data-dictionary | Entity, package, relationship CRUD |
| visualization | Interactive Cytoscape.js graph |
| search | Full-text and faceted search |
| version-control | Save/publish, history |
| perspective | Perspective management and resolution |
| remote-fs | File operations via @hamak/filesystem-server-impl |
| remote-git | Git operations via @hamak/ui-remote-git-fs-impl |
| notification | Toast notifications |
Data Model
Package
+-- Entity (uuid, name, description, stereotype, status, attributes[], metadata[])
| +-- Attribute (uuid, name, type, required, constraints, metadata[])
+-- Relationship (uuid, source, target, cardinality, metadata[])
+-- Sub-packages
Stereotype (id, name, appliesTo, metadataDefinitions[])
Perspective (uuid, name, rootEntities[], nodes[], maxDepth)
+-- PerspectiveNode (path, traverse?, exclude?, metadata[])
+-- ResolvedNode (entityUuid, path, hopDistance, isRoot, isFrontier)Persistence
data-dictionaries/
microservices/{package}/
{uuid}_{name}.yaml -- entity files
{uuid}.comments.yaml -- review comments (sidecar)
relationships.yaml -- package-level relationships
metadata.yaml -- package metadata
perspectives/{uuid}.yaml -- perspective definitions
stereotypes.yaml -- stereotype definitions
diagrams/{id}.json -- saved diagram layoutsAll files are git-tracked for versioning, branching, and collaboration.
API Surface
| Area | Endpoints |
|------|-----------|
| Entities | CRUD on /api/services/:service/entities/:entity |
| Packages | CRUD on /api/packages, hierarchy, path navigation |
| Relationships | CRUD on /api/packages/:pkg/relationships |
| Stereotypes | CRUD on /api/stereotypes |
| Perspectives | CRUD + resolve + graph on /api/perspectives |
| Search | GET /api/search?q=...&type=...&service=...&stereotype=... |
| Impact | GET /api/entities/:uuid/impact |
| Review | Submit/approve/return + comments on entities |
| Import | POST /api/import/json-schema, POST /api/import/sql-ddl |
| Export | GET /api/export/json-schema/:service, GET /api/export/markdown/:service |
| Quality | GET /api/quality/report |
| Git | Full git operations via /api/git/dictionaries/* (framework routes) |
| Version | Commit, history, revert via /api/commit, /api/history |
Getting Started
Prerequisites
| Tool | Version | Notes |
|------|---------|-------|
| Node.js | 18+ (LTS) | Hard requirement; checked via engines field |
| npm | 9+ | Ships with Node 18+ |
| Git | 2.x+ | Required — dictionaries are versioned in a git repository |
| Docker | 20+ | Optional, only for the containerized deployment path |
Optional database peer dependencies (only needed if you use Physical Sync to introspect a live database — see peerDependencies in package.json): pg, mysql2, mssql, or oracledb. Install just the one(s) you need.
Option A — Install from npm (end users)
The fastest way to run the app against a fresh, empty data directory:
# One-off run with auto-bootstrap of ./data-dictionaries
npx @hamak/smart-data-dico
# Or install globally
npm install -g @hamak/smart-data-dico
smart-data-dicoCLI flags:
| Flag | Default | Purpose |
|------|---------|---------|
| --port <n> | 3001 | Server port |
| --data-dir <path> | ./data-dictionaries | Project folder (the one containing dico.config.json) |
| --no-open | — | Don't auto-open the browser |
| -h, --help | — | Show help |
On first run, the CLI creates <data-dir>/dico.config.json and <data-dir>/.dico/ if they don't exist. Point it at an existing project folder to keep working on the data you already have.
Option B — Run from source (contributors)
# 1. Clone
git clone https://github.com/amah/smart-data-dico.git
cd smart-data-dico
# 2. Install dependencies — root, backend, frontend
npm install
cd backend && npm install && cd ..
cd frontend && npm install && cd ..
# 3. Start the dev servers in two terminals
cd backend && npm run dev # port 3001
cd frontend && npm run dev # port 3000- Frontend: http://localhost:3000
- Backend API: http://localhost:3001
- API docs: http://localhost:3001/api-docs
In dev, the backend serves the bundled sample project at samples/eshop/ (three packages: order-service, product-service, user-service). Override with DATA_DIR=/path/to/your/project npm run dev to point at your own data.
Option C — Restricted networks (no npm registry for this package, or no git)
When the npm registry copy of the package is unavailable, install it from git or
from a prebuilt release tarball. These do not require the package to be on a
registry; the lower options additionally avoid needing the git binary.
C1 — Prebuilt tarball over curl (no git, no build, fully offline-capable).
Each release attaches a self-contained tarball (esbuild-bundled backend, no
runtime node_modules; Node ≥ 18). Download and install it directly:
curl -fL -o smart-data-dico.tgz \
https://github.com/amah/smart-data-dico/releases/download/v1.11.0/hamak-smart-data-dico-1.11.0.tgz
npm install -g ./smart-data-dico.tgz
smart-data-dico --data-dir /path/to/your/projectFor a fully air-gapped machine, download the .tgz on a connected host, copy it
across, and run the npm install -g ./…tgz step there — it needs no network.
C2 — Build from git with the helper script. Clones a pinned ref, installs deps (root/frontend/backend), builds both bundles, packs a tarball, and installs the CLI globally:
scripts/install-from-git.sh # build v1.11.0, install globally
scripts/install-from-git.sh --pack-only # just produce the tarball (prints its path)Add --archive (auto-enabled when git is absent) to fetch the source via
curl/wget instead of cloning. Useful env/flags: SDD_REF (tag/branch/commit),
SDD_REPO_URL (internal mirror or SSH), -d (work dir). Run scripts/install-from-git.sh --help
for the full list.
Note: C2 still pulls build dependencies from whichever npm registry / proxy
npmis configured to use. If even that is blocked, use C1 — build the tarball once on a connected machine (--pack-only) and install the artifact offline.
Option C — Docker
docker-compose up
# App available at http://localhost:3001The image ships with no bundled sample data — docker-compose.yml mounts ./data-dictionaries from the host into the container. Either place your own project there, or copy samples/eshop/ into ./data-dictionaries/ before starting.
Dev Credentials (mock auth, dev mode only)
| User | Password | Role | |------|----------|------| | admin | admin123 | ADMIN (full access) | | editor | editor123 | EDITOR (create/edit) | | viewer | viewer123 | VIEWER (read-only) |
A Bearer mock-token-for-testing header also works for API testing.
Configuration Profiles
Set via PROFILE environment variable:
| Profile | Auth | Git | Use Case | |---------|------|-----|----------| | local | None | Local only | Single user, desktop | | team | Basic/JWT | Remote | Small team sharing a repo | | server | Auth0/SSO | Remote | Organization-wide deployment |
AI Assistance
The dictionary ships with three AI-aware surfaces, all optional. They share one set of credentials stored at ~/.dico-app/dico-app.json (mode 0600 — never checked into git):
| Surface | What it is | Where to find it |
|---|---|---|
| In-app AI chat panel | Sidebar chat that grounds against the live dictionary — list packages, describe entities, generate docs, propose edits | Toolbar button AI Assistant in the running web app |
| Slash commands | Built-in chat shortcuts (/list, /quality, /describe, /create, /relate, /export, /diagram, /help) + your own saved prompts | Type / in the chat composer |
| MCP server (dico-mcp) | A Model Context Protocol stdio server that exposes the same operations to external clients — Claude Desktop, Cursor, Roo Code, Claude Code | npx @hamak/smart-data-dico-mcp or node bin/dico-mcp.js from source |
Configure the in-app chat (one-time)
- Start the app (Option A, B, or C above).
- Open Settings → AI (or click the AI Assistant button → ⚙ icon).
- Pick a provider (
anthropic,openai, oropenai-compatible), paste an API key, and choose a model (e.g.claude-opus-4-7,gpt-4o, or any model youropenai-compatibleendpoint exposes). - Save. The config writes to
~/.dico-app/dico-app.jsonwith mode0600; no env vars are required.
Alternatively, if you set ANTHROPIC_API_KEY or OPENAI_API_KEY before launching the server, the provider is auto-detected for the first session — but the Settings dialog is the source of truth; saving overrides env-var defaults.
Conversations and saved prompts live under ~/.dico-app/storage/ as JSON files — portable, diffable, deletable.
Slash commands
| Command | What it does |
|---|---|
| /help | List all available slash commands |
| /list | List every package and its entity count |
| /describe | Describe the current entity in detail (uses page context) |
| /quality | Quality review of the current package (severity-grouped findings) |
| /create | Skeleton: create a new entity with suggested attributes |
| /relate | Skeleton: create a relationship between two entities |
| /export | Generate Markdown documentation for the current package |
| /diagram | Navigate to the organization diagram |
The pageContext placeholder is filled automatically from the current route (entity / package / etc.). User-saved prompts (via Prompts tab in the chat panel) appear in the same slash-command picker.
MCP server (dico-mcp)
The MCP server reuses the same backend services as the web app — it's a different transport for the same operations, not a duplicate code path. You can run the web UI and the MCP server side-by-side against the same project folder.
Tools exposed: listPackages, listEntities, getEntityDetails, createEntity, createRelationship, listStereotypes, listRoutes.
Launch (from npm):
# stdio server — talks JSON-RPC over stdin/stdout
npx @hamak/smart-data-dico --data-dir /path/to/your/project # web UI
# (the MCP entrypoint is bin/dico-mcp.js inside the package)
node "$(npm root -g)/@hamak/smart-data-dico/bin/dico-mcp.js" --data-dir /path/to/your/projectRegister with Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"smart-data-dico": {
"command": "npx",
"args": ["-y", "@hamak/smart-data-dico", "dico-mcp", "--data-dir", "/absolute/path/to/your/project"]
}
}
}Or, if installed from source, point at the bin script directly:
{
"command": "node",
"args": ["/absolute/path/to/smart-data-dico/bin/dico-mcp.js", "--data-dir", "/absolute/path/to/your/project"]
}Register with Cursor — .cursor/mcp.json (project-level) or ~/.cursor/mcp.json (global): identical shape.
Register with Roo Code — .roo/mcp.json: identical shape.
The MCP process speaks JSON-RPC on stdio and stays attached to its parent; all logging goes to stderr so it doesn't corrupt the stream. Git auto-commit honours the same GIT_AUTO_COMMIT env var as the web backend.
Connect external MCP servers to the in-app chat
The flip side of the above: the in-app chat can also consume external MCP servers (Filesystem, GitHub, etc.). Go to Settings → MCP for a curated registry — pick a server, paste any required tokens, save. The chat will then surface those server's tools alongside the built-in ones.
Technologies
| Layer | Stack | |-------|-------| | Backend | Node.js, Express, TypeScript (ESM), YAML, Git | | Frontend | React 18, Vite, TypeScript, Tailwind CSS, DaisyUI, Redux | | Framework | @hamak/app-framework (microkernel, DI, plugins) | | Visualization | Cytoscape.js (dagre + fcose layouts) | | Auth | JWT + Auth0 (mock mode for dev) | | Deployment | Docker (multi-stage build) |
Documentation
| Doc | What it covers |
|-----|----------------|
| Format reference | Authoritative spec for the on-disk format — project layout, dico.config.json, entities, validation, constraints, relationships, stereotypes, rules, cases, actions, state machines |
| User guide | Task-oriented walkthrough of the app |
| API reference | REST endpoints (live Swagger UI at /api-docs when the backend is running) |
| Deployment | Desktop vs. server modes, file layout, configuration |
| Migration plan | @hamak/app-framework migration notes |
| ADRs | Architecture decision records |
Claude Code skill
The docs/ folder doubles as a Claude Code skill
(docs/SKILL.md + the format reference) for authoring and validating dictionary files in
any project. Installing it means copying docs/ into a skills directory as smart-data-dico.
Claude then loads it automatically whenever you work in a folder containing dico.config.json.
From a local checkout of this repo:
npm run install:skill # → ~/.claude/skills/smart-data-dico
# or into a specific project:
scripts/install-skill.sh /path/to/project/.claude/skillsFrom another repo / machine (no checkout): pull just docs/ into your skills directory
with degit — no auth needed:
# global → ~/.claude/skills/smart-data-dico
npx degit amah/smart-data-dico/docs ~/.claude/skills/smart-data-dicoOr, without Node, via the release tarball:
tmp=$(mktemp -d) && curl -fsSL https://codeload.github.com/amah/smart-data-dico/tar.gz/refs/heads/main | tar -xz -C "$tmp" \
&& rm -rf ~/.claude/skills/smart-data-dico \
&& cp -R "$tmp"/*/docs ~/.claude/skills/smart-data-dico && rm -rf "$tmp"Swap the destination for <project>/.claude/skills/smart-data-dico to scope it to one project.
Re-run either command to update.
Behind a proxy: degit reads the lowercase https_proxy env var (not the CLI), while
curl takes an explicit --proxy flag — the curl form is usually the more reliable in
locked-down networks:
# degit
https_proxy=http://proxy.example.com:8080 npx degit amah/smart-data-dico/docs ~/.claude/skills/smart-data-dico
# curl
tmp=$(mktemp -d) && curl -fsSL --proxy http://proxy.example.com:8080 https://codeload.github.com/amah/smart-data-dico/tar.gz/refs/heads/main | tar -xz -C "$tmp" \
&& rm -rf ~/.claude/skills/smart-data-dico \
&& cp -R "$tmp"/*/docs ~/.claude/skills/smart-data-dico && rm -rf "$tmp"If the proxy does TLS interception, point Node/curl at your CA bundle (NODE_EXTRA_CA_CERTS=…
for degit, --cacert … for curl).
