npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

har-to-llm

v1.0.0

Published

Convert HAR files to LLM-friendly format

Readme

har-to-llm

A command-line tool and library for converting HAR (HTTP Archive) files to LLM-friendly formats.

Installation

npm install -g har-to-llm

Or use with npx:

npx har-to-llm ./file.har

Usage

Basic Usage

# Convert HAR file to markdown format (default)
har-to-llm ./file.har

# Convert to JSON format
har-to-llm ./file.har --format json

# Save output to file
har-to-llm ./file.har --output output.md

Filtering Options

# Filter by HTTP methods
har-to-llm ./file.har --methods GET,POST

# Filter by status codes
har-to-llm ./file.har --status 200,201,404

# Filter by domains
har-to-llm ./file.har --domains api.example.com,api.github.com

# Exclude domains
har-to-llm ./file.har --exclude-domains google-analytics.com,facebook.com

# Filter by request duration
har-to-llm ./file.har --min-duration 100 --max-duration 5000

Deduplication Options

# Default behavior: automatically remove semantically similar requests
har-to-llm ./file.har

# Disable deduplication to keep all requests
har-to-llm ./file.har --no-deduplicate

# Verbose output shows deduplication statistics
har-to-llm ./file.har --verbose

Note: By default, the tool uses semantic deduplication optimized for LLM training:

Semantic deduplication removes requests that follow the same pattern:

  • Same HTTP method
  • Same URL pattern (ignoring specific IDs: /users/1, /users/2/users/{id})
  • Same query parameter structure (names match, values can differ)
  • Same request body structure (JSON keys match, values can differ)
  • Same header structure (header names match, values can differ)

Examples of semantic deduplication:

  • GET /users/1, GET /users/2, GET /users/3 → keeps only GET /users/{id}
  • POST /users with different user data → keeps only one example
  • PUT /users/1 with different update data → keeps only one example

This ensures LLM training data contains unique API patterns without redundancy.

Header Filtering

The tool automatically filters out headers that are not useful for API implementation:

Excluded headers include:

  • Browser-specific: User-Agent, Accept, Accept-Language, Accept-Encoding, Cache-Control, Origin, Referer
  • Network: Connection, Keep-Alive, Transfer-Encoding, Content-Length, Date, Server
  • Security: X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, Strict-Transport-Security
  • Caching: ETag, Last-Modified, If-Modified-Since, If-None-Match
  • Analytics: X-Forwarded-For, X-Real-IP, X-Requested-With
  • CDN/Proxy: CF-Ray, X-Cache, X-Amz-Cf-Id

Kept headers include:

  • Authentication: Authorization, X-API-Key
  • Content: Content-Type
  • Custom API headers: X-Custom-Header, X-Rate-Limit-*, X-Request-ID
  • Response headers: Location, Set-Cookie

Output Formats

  • markdown (default): Human-readable markdown format
  • json: Structured JSON data
  • text: Simple text summary
  • curl: cURL commands for replaying requests
  • conversation: Conversation format for LLM training
  • structured: Detailed structured data with summary

Examples

# Get only successful API calls in JSON format
har-to-llm ./file.har --format json --status 200,201,204 --methods GET,POST,PUT,DELETE

# Generate cURL commands for debugging
har-to-llm ./file.har --format curl --output commands.sh

# Create conversation log for LLM training
har-to-llm ./file.har --format conversation --output training-data.md

# Show summary only
har-to-llm ./file.har --summary

# Verbose output with filtering
har-to-llm ./file.har --verbose --domains api.example.com --min-duration 500

# Keep all requests including semantically similar ones
har-to-llm ./file.har --no-deduplicate --verbose

Programmatic Usage

import { HARConverter, Formatters } from 'har-to-llm';
import * as fs from 'fs';

// Read HAR file
const harContent = fs.readFileSync('./file.har', 'utf8');
const harData = JSON.parse(harContent);

// Convert entries (with automatic semantic deduplication and header filtering)
const conversations = harData.log.entries.map(entry => 
  HARConverter.convertEntry(entry)
);

// Filter entries with semantic deduplication (default for LLM training)
const filteredEntries = HARConverter.filterEntries(harData.log.entries, {
  methods: ['GET', 'POST'],
  statusCodes: [200, 201],
  domains: ['api.example.com'],
  deduplicate: true // semantic deduplication
});

// Filter entries without deduplication
const allEntries = HARConverter.filterEntries(harData.log.entries, {
  methods: ['GET', 'POST'],
  deduplicate: false
});

// Manual semantic deduplication for LLM training
const semanticallyUnique = HARConverter.deduplicateEntries(harData.log.entries);

// Manual exact deduplication
const exactlyUnique = HARConverter.removeExactDuplicates(harData.log.entries);

// Generate different formats
const markdown = Formatters.toMarkdown(conversations);
const json = Formatters.toJSON(conversations);
const curl = Formatters.toCurlCommands(conversations);

// Get summary
const summary = HARConverter.generateSummary(harData.log.entries);

Output Formats

Markdown Format

# HTTP Conversations

## Request 1

**Timestamp:** 2023-12-01T10:30:00.000Z
**Duration:** 150ms

### Request
**Method:** GET
**URL:** https://api.example.com/users/1

**Headers:**
- authorization: Bearer token123
- content-type: application/json

### Response
**Status:** 200 OK

**Headers:**
- content-type: application/json

**Body:**
```json
{
  "id": 1,
  "name": "John Doe"
}

### JSON Format
```json
[
  {
    "request": {
      "method": "GET",
      "url": "https://api.example.com/users/1",
      "headers": {
        "authorization": "Bearer token123",
        "content-type": "application/json"
      },
      "queryParams": {},
      "body": null,
      "contentType": null
    },
    "response": {
      "status": 200,
      "statusText": "OK",
      "headers": {
        "content-type": "application/json"
      },
      "body": "{\"id\":1,\"name\":\"John Doe\"}",
      "contentType": "application/json"
    },
    "timestamp": "2023-12-01T10:30:00.000Z",
    "duration": 150
  }
]

cURL Format

# GET https://api.example.com/users/1
curl -X GET -H "authorization: Bearer token123" -H "content-type: application/json" "https://api.example.com/users/1"

Features

  • ✅ Convert HAR files to multiple LLM-friendly formats
  • ✅ Filter requests by method, status code, domain, and duration
  • Semantic deduplication optimized for LLM training
  • Automatic filtering of useless headers
  • ✅ Generate cURL commands for request replay
  • ✅ Create conversation logs for LLM training
  • ✅ Provide detailed summaries and statistics
  • ✅ Support for both CLI and programmatic usage
  • ✅ TypeScript support with full type definitions

Requirements

  • Node.js 16.0.0 or higher

License

MIT

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Changelog

1.0.0

  • Initial release
  • Support for multiple output formats
  • Filtering capabilities
  • CLI and programmatic APIs
  • Semantic deduplication for LLM training
  • Automatic header filtering