npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@neural-tools/fine-tune

v0.1.6

Published

Fine-tuning utilities for Neural Tools

Readme

@neural-tools/fine-tune

Fine-tuning utilities for Neural Tools

npm version License: MIT

Utilities for preparing, validating, and managing fine-tuning datasets for LLMs. Currently supports OpenAI's fine-tuning format.

Installation

npm install @neural-tools/fine-tune

With OpenAI

npm install @neural-tools/fine-tune openai

Features

  • Dataset Preparation - Convert various formats to fine-tuning format
  • Validation - Ensure datasets meet LLM requirements
  • Cost Estimation - Calculate fine-tuning costs before running
  • Quality Analysis - Analyze dataset quality and balance
  • Format Conversion - Convert between different training formats
  • Token Counting - Accurate token counting for cost estimation

Quick Start

import { FineTuneDataset } from '@neural-tools/fine-tune';

// Create dataset
const dataset = new FineTuneDataset();

// Add training examples
dataset.addExample({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
    { role: 'assistant', content: 'The capital of France is Paris.' }
  ]
});

dataset.addExample({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is 2 + 2?' },
    { role: 'assistant', content: '2 + 2 equals 4.' }
  ]
});

// Validate dataset
const validation = await dataset.validate();
if (!validation.isValid) {
  console.error('Validation errors:', validation.errors);
}

// Get cost estimate
const estimate = await dataset.estimateCost('gpt-3.5-turbo');
console.log(`Estimated cost: $${estimate.totalCost.toFixed(2)}`);

// Export for OpenAI
const jsonl = dataset.toJSONL();
await fs.writeFile('training-data.jsonl', jsonl);

API Reference

FineTuneDataset

Main class for managing fine-tuning datasets.

Constructor

new FineTuneDataset(options?: DatasetOptions)

interface DatasetOptions {
  format?: 'openai' | 'anthropic';  // Default: 'openai'
  validateOnAdd?: boolean;           // Default: true
}

Methods

addExample(example)

Add a training example to the dataset.

dataset.addExample({
  messages: [
    { role: 'system', content: 'System prompt' },
    { role: 'user', content: 'User message' },
    { role: 'assistant', content: 'Assistant response' }
  ]
});
addExamples(examples)

Add multiple examples at once.

dataset.addExamples([
  { messages: [...] },
  { messages: [...] },
  { messages: [...] }
]);
validate()

Validate the dataset.

const result = await dataset.validate();

interface ValidationResult {
  isValid: boolean;
  errors: string[];
  warnings: string[];
  stats: {
    totalExamples: number;
    avgTokensPerExample: number;
    minTokens: number;
    maxTokens: number;
  };
}
estimateCost(model)

Estimate fine-tuning cost.

const estimate = await dataset.estimateCost('gpt-3.5-turbo');

interface CostEstimate {
  model: string;
  totalTokens: number;
  trainingCost: number;
  totalCost: number;
  estimatedTime: number;  // Minutes
}
analyze()

Get dataset quality metrics.

const analysis = await dataset.analyze();

interface DatasetAnalysis {
  exampleCount: number;
  avgTokensPerMessage: number;
  tokenDistribution: {
    min: number;
    max: number;
    mean: number;
    median: number;
  };
  roleBalance: {
    system: number;
    user: number;
    assistant: number;
  };
  qualityScore: number;  // 0-100
}
toJSONL()

Export dataset as JSONL string.

const jsonl = dataset.toJSONL();
toJSON()

Export dataset as JSON array.

const json = dataset.toJSON();
fromJSONL(jsonl)

Load dataset from JSONL string.

dataset.fromJSONL(jsonlString);
fromJSON(json)

Load dataset from JSON array.

dataset.fromJSON(jsonArray);

Usage Examples

Basic Chat Fine-Tuning

import { FineTuneDataset } from '@neural-tools/fine-tune';

const dataset = new FineTuneDataset();

// Add customer support examples
const supportExamples = [
  {
    messages: [
      { role: 'system', content: 'You are a customer support agent.' },
      { role: 'user', content: 'How do I reset my password?' },
      { role: 'assistant', content: 'To reset your password: 1. Go to login page, 2. Click "Forgot Password", 3. Check your email for reset link.' }
    ]
  },
  {
    messages: [
      { role: 'system', content: 'You are a customer support agent.' },
      { role: 'user', content: 'My order hasn\'t arrived yet.' },
      { role: 'assistant', content: 'I can help track your order. Can you provide your order number?' }
    ]
  }
  // Add 50-100 more examples for good results
];

dataset.addExamples(supportExamples);

// Validate
const validation = await dataset.validate();
if (validation.isValid) {
  // Save for upload
  const jsonl = dataset.toJSONL();
  await fs.writeFile('support-training.jsonl', jsonl);
}

Code Generation Fine-Tuning

const dataset = new FineTuneDataset();

dataset.addExamples([
  {
    messages: [
      { role: 'system', content: 'You are a Python code generator.' },
      { role: 'user', content: 'Write a function to reverse a string' },
      { role: 'assistant', content: 'def reverse_string(s):\n    return s[::-1]' }
    ]
  },
  {
    messages: [
      { role: 'system', content: 'You are a Python code generator.' },
      { role: 'user', content: 'Create a function to check if number is prime' },
      { role: 'assistant', content: 'def is_prime(n):\n    if n < 2:\n        return False\n    for i in range(2, int(n**0.5) + 1):\n        if n % i == 0:\n            return False\n    return True' }
    ]
  }
]);

Cost Estimation

const dataset = new FineTuneDataset();
// ... add examples ...

// Estimate cost before running
const estimate = await dataset.estimateCost('gpt-3.5-turbo');

console.log(`Training ${dataset.size()} examples`);
console.log(`Total tokens: ${estimate.totalTokens}`);
console.log(`Estimated cost: $${estimate.totalCost.toFixed(2)}`);
console.log(`Estimated time: ${estimate.estimatedTime} minutes`);

// Only proceed if cost is acceptable
if (estimate.totalCost < 50) {
  await uploadAndTrain(dataset);
}

Dataset Quality Analysis

const analysis = await dataset.analyze();

console.log('Dataset Quality Report:');
console.log(`Examples: ${analysis.exampleCount}`);
console.log(`Avg tokens per message: ${analysis.avgTokensPerMessage}`);
console.log(`Quality score: ${analysis.qualityScore}/100`);

if (analysis.qualityScore < 70) {
  console.warn('Dataset quality is low. Add more diverse examples.');
}

if (analysis.exampleCount < 50) {
  console.warn('Dataset is small. Recommend at least 50-100 examples.');
}

Format Conversion

// Load from CSV
import { csvToFineTune } from '@neural-tools/fine-tune';

const csv = `
question,answer
"What is AI?","Artificial Intelligence is..."
"What is ML?","Machine Learning is..."
`;

const dataset = csvToFineTune(csv, {
  systemPrompt: 'You are a helpful AI tutor.',
  questionColumn: 'question',
  answerColumn: 'answer'
});

// Export to JSONL
const jsonl = dataset.toJSONL();

Validation and Error Handling

const dataset = new FineTuneDataset();
dataset.addExample({
  messages: [
    { role: 'user', content: 'Hello' },
    { role: 'assistant', content: 'Hi there!' }
  ]
});

const validation = await dataset.validate();

if (!validation.isValid) {
  console.error('Errors:');
  validation.errors.forEach(error => console.error(`  - ${error}`));
}

if (validation.warnings.length > 0) {
  console.warn('Warnings:');
  validation.warnings.forEach(warning => console.warn(`  - ${warning}`));
}

console.log('Stats:', validation.stats);

Fine-Tuning with OpenAI

import OpenAI from 'openai';
import { FineTuneDataset } from '@neural-tools/fine-tune';
import fs from 'fs/promises';

const openai = new OpenAI();
const dataset = new FineTuneDataset();

// 1. Prepare dataset
dataset.addExamples([/* your examples */]);

// 2. Validate
const validation = await dataset.validate();
if (!validation.isValid) {
  throw new Error('Invalid dataset');
}

// 3. Save to file
const jsonl = dataset.toJSONL();
await fs.writeFile('training.jsonl', jsonl);

// 4. Upload file
const file = await openai.files.create({
  file: await fs.readFile('training.jsonl'),
  purpose: 'fine-tune'
});

// 5. Create fine-tuning job
const fineTune = await openai.fineTuning.jobs.create({
  training_file: file.id,
  model: 'gpt-3.5-turbo'
});

console.log(`Fine-tune job created: ${fineTune.id}`);

Best Practices

1. Dataset Size

  • Minimum: 10 examples (for testing)
  • Recommended: 50-100 examples
  • Optimal: 500+ examples

2. Example Quality

  • Clear, consistent formatting
  • Diverse scenarios
  • Accurate, high-quality responses
  • Balanced across use cases

3. Token Count

  • Keep examples under 4096 tokens
  • Aim for consistent lengths
  • Monitor token distribution

4. System Prompts

// Good: Specific, consistent
{ role: 'system', content: 'You are a Python expert who writes clean, documented code.' }

// Bad: Generic, vague
{ role: 'system', content: 'You are helpful.' }

Pricing (as of 2024)

OpenAI fine-tuning costs:

  • GPT-3.5 Turbo: ~$0.008 per 1K tokens
  • GPT-4: ~$0.030 per 1K tokens

Example:

  • 100 examples × 200 tokens = 20K tokens
  • Cost: 20 × $0.008 = $0.16 (GPT-3.5)

Dependencies

Peer Dependencies

  • openai - Optional, for OpenAI integration

Contributing

Contributions are welcome! See the main repository for guidelines.

License

MIT - See LICENSE.md for details.

Links