myworkjournal
v0.2.0
Published
A searchable work journal powered by local RAG
Maintainers
Readme
myworkjournal
A searchable work journal powered by local RAG (Retrieval Augmented Generation).
Point it at your tickets, notes, or exports and ask natural language questions over your own work history — instantly surface what you built, why you built it, and what broke along the way.
ask "What auth issues did we have in Q1?"
ask "When did we first run into the memory leak problem?"
ask "What was the fix for NN-2725?"As you keep adding tickets over time, the value compounds. A year from now you'll have a fully queryable record of everything you've worked on.
Technically: ingests .txt, .md, and .rtf files into a local SQLite vector store, retrieves the most relevant chunks via cosine similarity, and streams the answer from your LLM of choice (Claude, OpenAI, or Ollama).
Requirements
- Node.js 18+
- An Anthropic API key — get one at console.anthropic.com
- No other API keys — embeddings run locally via
@xenova/transformers
One-time setup
cd myworkjournal
npm install # install dependencies
npm run build # compile TypeScript → dist/Add your Anthropic API key to .env:
ANTHROPIC_API_KEY=sk-ant-...Usage
Ingest files
mywj ingest ./path/to/your/notes- Recursively scans for
.txt,.md, and.rtffiles - Downloads the embedding model on first run (~25 MB, cached after that)
- Re-running is safe — unchanged files are skipped automatically
- The knowledge base is stored at
.myworkjournal/index.dbin the current directory
Restrict to specific extensions:
mywj ingest ./notes --extensions .txt,.mdAsk a question
mywj ask "What was the root cause of NN-2725?"Finds the most relevant chunks via cosine similarity and streams the answer from Claude in real time.
Check what's indexed
mywj statsDevelopment mode (no build step needed)
npm run dev -- ingest ./path/to/notes
npm run dev -- ask "Your question here"
npm run dev -- statsTypical workflow
# 1. Point it at your notes / exports / docs
mywj ingest "C:\path\to\jira-exports"
# 2. Ask anything
mywj ask "Which tickets are related to the auth service?"
mywj ask "What fix was deployed in version 2.4.1?"
mywj ask "Summarise all memory leak issues"
# 3. Add more files any time — re-ingest is incremental
mywj ingest "C:\path\to\more-notes"Project structure
src/
├── cli/index.ts commander entry point
├── types/index.ts shared interfaces
├── ingestion/
│ ├── scanner.ts recursive dir walk with extension filtering
│ ├── normalizer.ts whitespace normalisation + sha256 content hash
│ ├── chunker.ts sliding window chunker (~900 tokens, 100 overlap)
│ ├── ingestionEngine.ts orchestrates scan → parse → normalise → chunk → embed → store
│ └── parsers/
│ ├── index.ts ParserRegistry (strategy pattern)
│ ├── txt.ts
│ ├── markdown.ts
│ └── rtf.ts inline RTF stripper (no extra dependency)
├── embeddings/
│ └── local.ts local embeddings via @xenova/transformers (all-MiniLM-L6-v2)
├── vectorstore/
│ └── sqlite.ts better-sqlite3 + cosine similarity in JS
└── query/
└── engine.ts top-k retrieval + Claude claude-opus-4-6 synthesis (streaming)Database
The knowledge base is stored locally at:
<current-working-directory>/.myworkjournal/index.dbSchema:
| Table | Columns |
|-------------|----------------------------------------------------------------|
| documents | id, file_path, content_hash, ingested_at |
| chunks | id, document_id, content, embedding, chunk_index |
Chunks are deleted automatically when their parent document is removed (ON DELETE CASCADE).
Design decisions
- Hash-based dedup — files are re-processed only when their content changes.
- Pluggable abstractions —
Parser,EmbeddingProvider, andVectorStoreare interfaces. Phase 2 additions (PDF/DOCX parsers, LanceDB) are drop-in implementations with no changes to ingestion or query logic. - No external vector DB — cosine similarity runs in JS over SQLite rows. Suitable for thousands of chunks; swap to LanceDB when scale demands it.
- Per-project database — the
.myworkjournal/folder lives next to your knowledge files, not in a global location. - Fully local embeddings — no API key or internet connection needed for ingestion after the model is cached.
Roadmap
| Phase | Features |
|-------|----------|
| 1 (current) | .txt, .md, .rtf — local embeddings — SQLite — hybrid FTS + semantic search — Claude streaming synthesis |
| 2 | PDF parser — quick capture (mywj note "...") — git log ingestion (mywj ingest-git) |
| 3 | Watch folder (auto-ingest on file change) — URL ingestion (mywj ingest-url <url>) |
