@macive/lesson-plan-parser

v1.0.0

Published

2 months ago

Robust, flexible DOCX lesson plan parser that converts structured lesson plans into clean, database-ready JSON.

0High
0Medium
0Low

cluewax

docx parser lesson-plan json curriculum

Lesson Plan Parser

A robust, flexible TypeScript Node.js application that parses DOCX lesson plan documents into clean, structured JSON suitable for database seeding.

Features

DOCX-to-JSON conversion using mammoth for reliable text extraction
HTML intermediate parsing to preserve table structures (Core Competencies, Values, PCIs)
Flexible field extraction handles formatting variations across different curriculum files
Auto-fix engine repairs common DOCX extraction errors (concatenated lines, missing newlines, malformed headers)
Multi-lesson splitting accurately splits documents into individual lesson blocks
Fully typed with TypeScript interfaces for all parsed entities
CLI & programmatic API use via command line or import into your own code

Installation

npm install

Usage

CLI

# Parse single file
npx ts-node src/cli.ts path/to/lesson-plans.docx -o ./output

# Parse multiple files
npx ts-node src/cli.ts plan1.docx plan2.docx plan3.docx -o ./json-output

# With options
npx ts-node src/cli.ts plans.docx -o ./output --raw --pretty

CLI Options:

-o, --output <dir> — Output directory (default: ./output)
-r, --raw — Include raw extracted text in JSON
-p, --pretty — Pretty-print JSON output
-m, --min-length <n> — Minimum lesson block length filter

Programmatic API

import { parseDocxFile, parseDocxFiles } from "./index";

// Single file
const result = await parseDocxFile("path/to/plans.docx");
console.log(result.document?.lessons);

// Multiple files
const results = await parseDocxFiles([
  { filePath: "plan1.docx", outputPath: "out1.json" },
  { filePath: "plan2.docx", outputPath: "out2.json" },
]);

JSON Output Structure

{
  "metadata": {
    "sourceFile": "string",
    "term": 1,
    "level": "GRADE 8",
    "learningArea": "AGRICULTURE AND NUTRITION",
    "grade": "Grade 8"
  },
  "lessons": [
    {
      "weekLessonLabel": "WEEK 1: LESSON 1",
      "weekNumber": 1,
      "lessonNumber": "1",
      "isCombinedLesson": false,
      "strand": "Hygiene Practices",
      "subStrand": "Cleaning practices",
      "learningOutcomes": {
        "preamble": "By the end of the lesson, the learner should be able to:",
        "items": ["Identify appropriate procedures...", "Explain the routine...", "Appreciate a clean..."]
      },
      "keyInquiryQuestions": ["How can we...?"],
      "coreCompetencies": ["Learning to learn", "Digital literacy"],
      "values": ["Integrity", "Responsibility"],
      "pcis": ["Health promotion", "Safety"],
      "learningResources": ["Agriculture and Nutrition grade 8..."],
      "organizationOfLearning": {
        "introduction": { "duration": "5 minutes", "content": ["Review..."] },
        "lessonDevelopment": {
          "duration": "30 minutes",
          "steps": [
            { "stepNumber": 1, "title": "Daily Cleaning Practices", "content": ["Discuss..."] }
          ]
        },
        "conclusion": { "duration": "5 minutes", "content": ["Summarize..."] }
      },
      "extendedActivities": ["Conducting a kitchen cleaning..."],
      "teacherSelfEvaluation": ""
    }
  ]
}

Supported Document Variations

The parser is designed to be resilient against different formatting styles:

Header tables with varying column orders and empty cells
Spelling variations: "Organization" vs "Organisation", "Sub Strand" vs "Sub-Strand"
Key Inquiry Questions with optional (s) suffix
Numbered lists with various prefixes (1., 1.Define, 1**.**)
Combined lessons like WEEK 6: LESSON 1 - 2
Missing or empty teacher self-evaluation sections
Tables vs vertical lists for competencies/values/pcis
Malformed first lessons with concatenated fields

Project Structure

src/
  types/          # TypeScript interfaces
  extractors/     # DOCX text extraction
  parsers/        # Core parsing engine
  utils/          # Text normalization utilities
  index.ts        # Public API
  cli.ts          # CLI entry point

Building

npm run build

Compiled JavaScript will be in the dist/ directory.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme