myworkjournal

v0.2.0

Published

13 hours ago

A searchable work journal powered by local RAG

0High
0Medium
0Low

alphagiel

rag knowledge-base work-journal local-ai semantic-search cli

myworkjournal

A searchable work journal powered by local RAG (Retrieval Augmented Generation).

Point it at your tickets, notes, or exports and ask natural language questions over your own work history — instantly surface what you built, why you built it, and what broke along the way.

ask "What auth issues did we have in Q1?"
ask "When did we first run into the memory leak problem?"
ask "What was the fix for NN-2725?"

As you keep adding tickets over time, the value compounds. A year from now you'll have a fully queryable record of everything you've worked on.

Technically: ingests .txt, .md, and .rtf files into a local SQLite vector store, retrieves the most relevant chunks via cosine similarity, and streams the answer from your LLM of choice (Claude, OpenAI, or Ollama).

Requirements

Node.js 18+
An Anthropic API key — get one at console.anthropic.com
No other API keys — embeddings run locally via @xenova/transformers

One-time setup

cd myworkjournal

npm install       # install dependencies
npm run build     # compile TypeScript → dist/

Add your Anthropic API key to .env:

ANTHROPIC_API_KEY=sk-ant-...

Usage

Ingest files

mywj ingest ./path/to/your/notes

Recursively scans for .txt, .md, and .rtf files
Downloads the embedding model on first run (~25 MB, cached after that)
Re-running is safe — unchanged files are skipped automatically
The knowledge base is stored at .myworkjournal/index.db in the current directory

Restrict to specific extensions:

mywj ingest ./notes --extensions .txt,.md

Ask a question

mywj ask "What was the root cause of NN-2725?"

Finds the most relevant chunks via cosine similarity and streams the answer from Claude in real time.

Check what's indexed

mywj stats

Development mode (no build step needed)

npm run dev -- ingest ./path/to/notes
npm run dev -- ask "Your question here"
npm run dev -- stats

Typical workflow

# 1. Point it at your notes / exports / docs
mywj ingest "C:\path\to\jira-exports"

# 2. Ask anything
mywj ask "Which tickets are related to the auth service?"
mywj ask "What fix was deployed in version 2.4.1?"
mywj ask "Summarise all memory leak issues"

# 3. Add more files any time — re-ingest is incremental
mywj ingest "C:\path\to\more-notes"

Project structure

src/
├── cli/index.ts               commander entry point
├── types/index.ts             shared interfaces
├── ingestion/
│   ├── scanner.ts             recursive dir walk with extension filtering
│   ├── normalizer.ts          whitespace normalisation + sha256 content hash
│   ├── chunker.ts             sliding window chunker (~900 tokens, 100 overlap)
│   ├── ingestionEngine.ts     orchestrates scan → parse → normalise → chunk → embed → store
│   └── parsers/
│       ├── index.ts           ParserRegistry (strategy pattern)
│       ├── txt.ts
│       ├── markdown.ts
│       └── rtf.ts             inline RTF stripper (no extra dependency)
├── embeddings/
│   └── local.ts               local embeddings via @xenova/transformers (all-MiniLM-L6-v2)
├── vectorstore/
│   └── sqlite.ts              better-sqlite3 + cosine similarity in JS
└── query/
    └── engine.ts              top-k retrieval + Claude claude-opus-4-6 synthesis (streaming)

Database

The knowledge base is stored locally at:

<current-working-directory>/.myworkjournal/index.db

Schema:

| Table | Columns | |-------------|----------------------------------------------------------------| | documents | id, file_path, content_hash, ingested_at | | chunks | id, document_id, content, embedding, chunk_index |

Chunks are deleted automatically when their parent document is removed (ON DELETE CASCADE).

Design decisions

Hash-based dedup — files are re-processed only when their content changes.
Pluggable abstractions — Parser, EmbeddingProvider, and VectorStore are interfaces. Phase 2 additions (PDF/DOCX parsers, LanceDB) are drop-in implementations with no changes to ingestion or query logic.
No external vector DB — cosine similarity runs in JS over SQLite rows. Suitable for thousands of chunks; swap to LanceDB when scale demands it.
Per-project database — the .myworkjournal/ folder lives next to your knowledge files, not in a global location.
Fully local embeddings — no API key or internet connection needed for ingestion after the model is cached.

Roadmap

| Phase | Features | |-------|----------| | 1 (current) | .txt, .md, .rtf — local embeddings — SQLite — hybrid FTS + semantic search — Claude streaming synthesis | | 2 | PDF parser — quick capture (mywj note "...") — git log ingestion (mywj ingest-git) | | 3 | Watch folder (auto-ingest on file change) — URL ingestion (mywj ingest-url <url>) |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

myworkjournal

Requirements

One-time setup

Usage

Ingest files

Ask a question

Check what's indexed

Development mode (no build step needed)

Typical workflow

Project structure

Database

Design decisions

Roadmap