npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

analyze-qna

v0.1.5

Published

Analyze Q&A YAML files and report token counts and structure quality

Readme

analyze-qna

CLI to analyze InstructLab qna.yaml files and report token counts, structure quality, and formatting lint. This tool is intended to help authors ensure their datasets pass InstructLab validations before running ilab taxonomy diff.

Features

  • Token counting: Uses tiktoken (OpenAI).
  • Rules enforced:
    • Context ~500 tokens (default warn if outside 300–500)
    • Each Q/A pair ~250 tokens (default warn if outside 200–300)
    • Context + all pairs ≤ 750 tokens (error if over)
    • 1-3 Q/A pairs per context (3 is optimal; extras ignored with warning)
    • 5–15 sections required (warning otherwise)
    • Optional checks: Q and A text present in context; context appears in source doc (when provided)
  • LLM-powered analysis (NEW):
    • Grounding checks: Validates that answers are actually based on the provided context
    • Q&A suggestions: Automatically generates additional Q&A pairs when fewer than 3 exist
    • Source verification: Fetches documents from git repos and validates context accuracy
    • Supports Ollama and OpenAI-compatible endpoints (including vLLM)
  • Configurable thresholds: via --config or CLI flags (see below)
  • Stronger source matching: normalized substring + line-based fraction matching
  • Directory crawl: --taxonomy-root crawls a tree and analyzes files named qna.yaml.
  • Readable report: Pretty table output via tabulate with per-pair breakout.
  • Agent mode: --ai emits structured JSON for programmatic use.
  • YAML lint: --yaml-lint checks trailing whitespace, missing final newline, tabs/mixed indentation, CRLF endings, and duplicate keys.

Installation

  • Local development:

    • python -m venv .venv && source .venv/bin/activate
    • pip install -r requirements.txt
  • After publish (or via npm link), use npx:

    • npx analyze-qna --help

Usage

  • Direct Python

    • python src/analyze_qna.py --file path/to/qna.yaml
    • python src/analyze_qna.py --file path/to/qna.yaml --ai
    • python src/analyze_qna.py --file path/to/qna.yaml --ai --source-doc path/to/source.txt
    • python src/analyze_qna.py --taxonomy-root path/to/taxonomy
    • python src/analyze_qna.py --taxonomy-root path/to/taxonomy --ai
    • python src/analyze_qna.py --file path/to/qna.yaml --yaml-lint
    • python src/analyze_qna.py --taxonomy-root path/to/taxonomy --yaml-lint
    • python src/analyze_qna.py --data-dir path/to/dir (deprecated)
  • Via npx (after publishing or via npm link)

    • npx analyze-qna --file path/to/qna.yaml
    • npx analyze-qna --taxonomy-root path/to/taxonomy
    • npx analyze-qna --file path/to/qna.yaml --yaml-lint

Configuration

LLM Configuration (New!)

analyze-qna now supports LLM-powered features including grounding checks and Q&A suggestions:

  • Interactive setup: npx analyze-qna config init

    • Creates config at ~/.config/analyze-qna/config.yaml
    • Supports Ollama and OpenAI-compatible endpoints
    • Configures features like grounding checks and Q&A suggestions
    • Max tokens limit: up to 32,000 for larger models
  • Validate configuration: npx analyze-qna config validate [config-file]

    • Tests LLM connectivity and model availability
    • Verifies API keys and endpoints
    • Shows enabled features and thresholds
  • Use LLM features:

    • After running config init, LLM features are automatically enabled (no flags needed)
    • Use alternate config: npx analyze-qna --file qna.yaml --llm-config /path/to/config.yaml
    • Set default via environment: export ANALYZE_QNA_CONFIG=/path/to/config.yaml
    • Temporarily disable LLM: npx analyze-qna --file qna.yaml --no-llm

When LLM features are enabled, the tool will:

  • Show "[LLM]" indicators in output where LLM analysis is used
  • Add grounding validation results to the "A in Ctx" column
  • Display Q&A suggestions after each example that needs them
  • Automatically fetch and validate source documents from git repositories

Analysis Thresholds

You can provide a JSON config or override values via CLI.

  • JSON file (example config.json):
{
  "context_min": 320,
  "context_max": 520,
  "pair_min": 180,
  "pair_max": 320,
  "section_max": 800,
  "examples_min": 5,
  "examples_max": 15,
  "line_match_min_length": 30,
  "line_match_fraction_min": 0.85
}
  • CLI flags:
    • --config config.json
    • --context-range 320,520
    • --pair-range 180,320
    • --examples-range 5,15
    • --section-max 800
    • --line-match-min-length 30
    • --line-match-fraction-min 0.85

Examples

  • Analyze single file:
    • analyze-qna --file ./datasets/foo/qna.yaml
  • Agent-friendly JSON output:
    • analyze-qna --file ./datasets/foo/qna.yaml --ai
  • Verify context against original source document:
    • analyze-qna --file ./datasets/foo/qna.yaml --ai --source-doc ./datasets/foo/source.txt
  • Override thresholds on the fly:
    • analyze-qna --taxonomy-root ./datasets --context-range 350,550 --pair-range 180,320 --section-max 800

Output

Human-readable table per file with per-pair breakout (Q/A tokens, totals, and whether they appear in context). A Notes section lists warnings (extra pairs ignored, out-of-range pairs, missing Q/A in context, context not matching source document).

When --yaml-lint is enabled, a YAML Lint section lists any formatting issues (trailing whitespace, missing final newline, CRLF, tabs/mixed indentation, duplicate keys).

Schema validation (InstructLab v3)

  • Validates knowledge QnA files against a bundled InstructLab v3 JSON Schema when the file path contains /knowledge/ (e.g., when analyzing a taxonomy tree).
  • Human mode prints a "Schema Validation" section with the failing path and a short hint from the schema when available.
  • AI mode adds a schema block in the JSON output with validated_against and detailed errors.

Bundled schemas (offline):

  • src/instructlab/schema/v3/knowledge.json
  • Upstream references: InstructLab schemas v3 and taxonomy layout
    • https://github.com/instructlab/schema/tree/main/src/instructlab/schema/v3
    • https://github.com/instructlab/taxonomy

Notes:

  • Requires jsonschema (already in requirements.txt); the Node wrapper installs it automatically.
  • If schema.validated_against is null, validation was skipped (non-knowledge path or schema not found).
  • Currently validates knowledge QnA. Other dataset types (e.g., compositional, foundational) receive lint checks; schema validation for those types can be added later.

Contributing

Contributions are welcome! Please read the guidelines in CONTRIBUTING.md and open an issue or pull request on GitHub.

  • Repo: https://github.com/rdwj/analyze-qna
  • Issues: https://github.com/rdwj/analyze-qna/issues
  • Maintainer: Wes Jackson ([email protected])

Development

  • Create and activate a venv, then install requirements:

    • python -m venv .venv && source .venv/bin/activate
    • pip install -r requirements.txt
  • Run locally via Node wrapper:

    • node bin/analyze-qna.js --file path/to/qna.yaml
  • Linting/type hints: optional stubs included in requirements.txt (types-PyYAML, types-tabulate).

Publishing

  1. Ensure executable bit on the Node bin
    • chmod +x bin/analyze-qna.js
    • git add --chmod=+x bin/analyze-qna.js
  2. Bump version and dry run
    • npm version patch
    • FILE=$(npm pack --silent) && echo $FILE && tar -tf $FILE | cat
  3. Login & publish
    • npm login
    • npm publish
  4. Post-publish test
    • npx analyze-qna --help

License

MIT. See LICENSE for details.

Acknowledgement

This utility is designed for use with InstructLab qna.yaml datasets and aims to mirror important validations to reduce failures during ilab taxonomy diff. InstructLab is an open-source project; please consult its documentation for canonical requirements and behavior.