npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

hazo_files

v1.4.3

Published

File management including integration to cloud files

Readme

hazo_files

npm version License: MIT

A powerful, modular file management package for Node.js and React applications with support for local filesystem and Google Drive storage. Built with TypeScript for type safety and developer experience.

Features

  • Multiple Storage Providers: Local filesystem and Google Drive support out of the box
  • Modular Architecture: Easily add custom storage providers
  • Unified API: Single consistent interface across all storage providers
  • React UI Components: Drop-in FileBrowser component with folder tree, file list, and preview
  • Naming Rules System: Visual configurator and utilities for generating consistent file/folder names
  • Naming Convention Management: Full CRUD with UI components for managing naming conventions in database
  • Extraction Data Management: Track and manage LLM-extracted metadata with merge strategies
  • LLM Integration: Built-in support for hazo_llm_api document/image extraction
  • Upload + Extract Workflow: Combined service for uploading files with automatic LLM extraction and naming
  • File Reference Tracking: Multi-entity file references with orphan detection, soft delete, and lifecycle management
  • File Change Detection: xxHash-based content hashing for efficient change detection
  • Content Tagging: Optional LLM-based content classification at upload time or on-demand via content_tag field
  • Schema Migrations: Built-in V2/V3 migration utilities for adding reference tracking and content tagging to existing databases
  • TypeScript: Full type safety and IntelliSense support
  • OAuth Integration: Built-in Google Drive OAuth authentication
  • Progress Tracking: Upload/download progress callbacks
  • File Validation: Extension filtering and file size limits
  • Error Handling: Comprehensive error types and handling

Installation

npm install hazo_files

For React UI components, ensure you have React 18+ installed:

npm install react react-dom

For the NamingRuleConfigurator component (drag-and-drop interface), also install:

npm install @dnd-kit/core @dnd-kit/sortable @dnd-kit/utilities

For database tracking and LLM extraction features (optional):

npm install hazo_connect      # Database tracking
npm install hazo_llm_api      # LLM document extraction
npm install server-only       # Server-side safety (recommended)
# Note: xxhash-wasm is included automatically as a dependency

Tailwind CSS v4 Setup (Required for UI Components)

If you're using Tailwind CSS v4 with the UI components, you must add a @source directive to your CSS file to ensure Tailwind scans the package's files for utility classes.

Add this to your globals.css or main CSS file AFTER the tailwindcss import:

@import "tailwindcss";

/* Required: Enable Tailwind to scan hazo_files package for utility classes */
@source "../node_modules/hazo_files/dist/ui";

Without this directive, Tailwind v4's JIT compiler will not generate CSS for the utility classes used in hazo_files components (like hover:bg-gray-100, text-sm, rounded-md, etc.), resulting in broken styling.

Note: This is only required for Tailwind v4. Earlier versions of Tailwind automatically scan node_modules and do not need this configuration.

Quick Start

Basic Usage (Server-side)

import { createInitializedFileManager } from 'hazo_files';

// Create and initialize file manager
const fileManager = await createInitializedFileManager({
  config: {
    provider: 'local',
    local: {
      basePath: './files',
      maxFileSize: 10 * 1024 * 1024, // 10MB
      allowedExtensions: ['jpg', 'png', 'pdf', 'txt']
    }
  }
});

// Create a directory
await fileManager.createDirectory('/documents');

// Upload a file
await fileManager.uploadFile(
  './local-file.pdf',
  '/documents/file.pdf',
  {
    onProgress: (progress, bytes, total) => {
      console.log(`Upload progress: ${progress}%`);
    }
  }
);

// List directory contents
const result = await fileManager.listDirectory('/documents');
if (result.success) {
  console.log(result.data);
}

// Download a file
await fileManager.downloadFile('/documents/file.pdf', './downloaded.pdf');

Using Configuration File

Create hazo_files_config.ini in your project root:

[general]
provider = local

[local]
base_path = ./files
max_file_size = 10485760
allowed_extensions = jpg,png,pdf,txt

Then initialize without config object:

import { createInitializedFileManager } from 'hazo_files';

const fileManager = await createInitializedFileManager();

React UI Component

import { FileBrowser } from 'hazo_files/ui';
import type { FileBrowserAPI } from 'hazo_files/ui';

// Create an API adapter that calls your server endpoints
const api: FileBrowserAPI = {
  async listDirectory(path: string) {
    const res = await fetch(`/api/files?action=list&path=${path}`);
    return res.json();
  },
  async getFolderTree(path = '/', depth = 3) {
    const res = await fetch(`/api/files?action=tree&path=${path}&depth=${depth}`);
    return res.json();
  },
  async uploadFile(file: File, remotePath: string) {
    const formData = new FormData();
    formData.append('file', file);
    formData.append('path', remotePath);
    const res = await fetch('/api/files/upload', { method: 'POST', body: formData });
    return res.json();
  },
  // ... implement other methods
};

function MyFileBrowser() {
  return (
    <FileBrowser
      api={api}
      initialPath="/"
      showPreview={true}
      showTree={true}
      viewMode="grid"
    />
  );
}

Advanced Usage

Google Drive Integration

1. Set up Google Cloud Console

  1. Go to Google Cloud Console
  2. Create a new project or select an existing one
  3. Enable the Google Drive API
  4. Create OAuth 2.0 credentials
  5. Add authorized redirect URIs (e.g., http://localhost:3000/api/auth/callback/google)

2. Configure Environment Variables

Create .env.local:

HAZO_GOOGLE_DRIVE_CLIENT_ID=your-client-id.apps.googleusercontent.com
HAZO_GOOGLE_DRIVE_CLIENT_SECRET=your-client-secret
HAZO_GOOGLE_DRIVE_REDIRECT_URI=http://localhost:3000/api/auth/callback/google

3. Configure hazo_files

[general]
provider = google_drive

[google_drive]
client_id =
client_secret =
redirect_uri = http://localhost:3000/api/auth/callback/google
refresh_token =

Environment variables will automatically override empty values.

4. Implement OAuth Flow

import { createFileManager, GoogleDriveModule } from 'hazo_files';

// Initialize with Google Drive
const fileManager = createFileManager({
  config: {
    provider: 'google_drive',
    google_drive: {
      clientId: process.env.HAZO_GOOGLE_DRIVE_CLIENT_ID!,
      clientSecret: process.env.HAZO_GOOGLE_DRIVE_CLIENT_SECRET!,
      redirectUri: process.env.HAZO_GOOGLE_DRIVE_REDIRECT_URI!,
    }
  }
});

await fileManager.initialize();

// Get the Google Drive module to access auth methods
const module = fileManager.getModule() as GoogleDriveModule;
const auth = module.getAuth();

// Generate auth URL
const authUrl = auth.getAuthUrl();
console.log('Visit:', authUrl);

// After user authorizes, exchange code for tokens
const tokens = await auth.exchangeCodeForTokens(authCode);

// Authenticate the module
await module.authenticate(tokens);

// Now you can use the file manager
await fileManager.createDirectory('/MyFolder');

Next.js API Route Example

// app/api/files/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createInitializedFileManager } from 'hazo_files';

async function getFileManager() {
  return createInitializedFileManager({
    config: {
      provider: 'local',
      local: {
        basePath: process.env.LOCAL_STORAGE_BASE_PATH || './files',
      }
    }
  });
}

export async function GET(request: NextRequest) {
  const { searchParams } = new URL(request.url);
  const action = searchParams.get('action');
  const path = searchParams.get('path') || '/';

  const fm = await getFileManager();

  switch (action) {
    case 'list':
      return NextResponse.json(await fm.listDirectory(path));
    case 'tree':
      const depth = parseInt(searchParams.get('depth') || '3', 10);
      return NextResponse.json(await fm.getFolderTree(path, depth));
    default:
      return NextResponse.json({ success: false, error: 'Invalid action' });
  }
}

export async function POST(request: NextRequest) {
  const body = await request.json();
  const { action, ...params } = body;

  const fm = await getFileManager();

  switch (action) {
    case 'createDirectory':
      return NextResponse.json(await fm.createDirectory(params.path));
    case 'deleteFile':
      return NextResponse.json(await fm.deleteFile(params.path));
    case 'renameFile':
      return NextResponse.json(await fm.renameFile(params.path, params.newName));
    default:
      return NextResponse.json({ success: false, error: 'Invalid action' });
  }
}

File Upload API Route

// app/api/files/upload/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createInitializedFileManager } from 'hazo_files';

export async function POST(request: NextRequest) {
  const formData = await request.formData();
  const file = formData.get('file') as File;
  const path = formData.get('path') as string;

  const fm = await getFileManager();

  // Convert File to Buffer
  const arrayBuffer = await file.arrayBuffer();
  const buffer = Buffer.from(arrayBuffer);

  const result = await fm.uploadFile(buffer, path);
  return NextResponse.json(result);
}

Progress Tracking

// Upload with progress tracking
await fileManager.uploadFile(
  './large-file.zip',
  '/uploads/large-file.zip',
  {
    onProgress: (progress, bytesTransferred, totalBytes) => {
      console.log(`Progress: ${progress.toFixed(2)}%`);
      console.log(`${bytesTransferred} / ${totalBytes} bytes`);
    }
  }
);

// Download with progress tracking
await fileManager.downloadFile(
  '/uploads/large-file.zip',
  './downloaded-file.zip',
  {
    onProgress: (progress, bytesTransferred, totalBytes) => {
      console.log(`Download: ${progress.toFixed(2)}%`);
    }
  }
);

File Operations

// Create directory structure
await fileManager.createDirectory('/projects/2024/docs');

// Upload file
const uploadResult = await fileManager.uploadFile(
  buffer,
  '/projects/2024/docs/report.pdf'
);

// Move file
await fileManager.moveItem(
  '/projects/2024/docs/report.pdf',
  '/archive/2024/report.pdf'
);

// Rename file
await fileManager.renameFile(
  '/archive/2024/report.pdf',
  'annual-report.pdf'
);

// Copy file (convenience method)
await fileManager.copyFile(
  '/archive/2024/annual-report.pdf',
  '/backup/annual-report.pdf'
);

// Delete file
await fileManager.deleteFile('/backup/annual-report.pdf');

// Remove directory (recursive)
await fileManager.removeDirectory('/archive/2024', true);

// Check if file exists
const exists = await fileManager.exists('/projects/2024/docs');

// Get file/folder information
const itemResult = await fileManager.getItem('/projects/2024/docs/report.pdf');
if (itemResult.success && itemResult.data) {
  console.log('File:', itemResult.data.name);
  console.log('Size:', itemResult.data.size);
  console.log('Modified:', itemResult.data.modifiedAt);
}

// List directory with options
const listResult = await fileManager.listDirectory('/projects', {
  recursive: true,
  includeHidden: false,
  filter: (item) => !item.isDirectory && item.name.endsWith('.pdf')
});

Working with Text Files

// Write text file
await fileManager.writeFile('/notes/readme.txt', 'Hello, World!');

// Read text file
const readResult = await fileManager.readFile('/notes/readme.txt');
if (readResult.success) {
  console.log(readResult.data); // "Hello, World!"
}

Folder Tree

// Get folder tree (3 levels deep by default)
const treeResult = await fileManager.getFolderTree('/projects', 3);
if (treeResult.success && treeResult.data) {
  console.log(JSON.stringify(treeResult.data, null, 2));
}

// Output:
// [
//   {
//     "id": "abc123",
//     "name": "2024",
//     "path": "/projects/2024",
//     "children": [
//       {
//         "id": "def456",
//         "name": "docs",
//         "path": "/projects/2024/docs",
//         "children": []
//       }
//     ]
//   }
// ]

Configuration

Configuration File (hazo_files_config.ini)

[general]
provider = local

[local]
base_path = ./files
allowed_extensions = jpg,png,pdf,txt,doc,docx
max_file_size = 10485760

[google_drive]
client_id = your-client-id.apps.googleusercontent.com
client_secret = your-client-secret
redirect_uri = http://localhost:3000/api/auth/callback/google
refresh_token =
access_token =
root_folder_id =

[naming]
; Supported date format tokens for naming rules
date_formats = YYYY,YY,MM,M,DD,D,MMM,MMMM,YYYY-MM-DD,YYYY-MMM-DD,DD-MM-YYYY,MM-DD-YYYY

Environment Variables

The following environment variables can override configuration file values:

  • HAZO_GOOGLE_DRIVE_CLIENT_ID
  • HAZO_GOOGLE_DRIVE_CLIENT_SECRET
  • HAZO_GOOGLE_DRIVE_REDIRECT_URI
  • HAZO_GOOGLE_DRIVE_REFRESH_TOKEN
  • HAZO_GOOGLE_DRIVE_ACCESS_TOKEN
  • HAZO_GOOGLE_DRIVE_ROOT_FOLDER_ID

Configuration via Code

import { createInitializedFileManager } from 'hazo_files';

const fileManager = await createInitializedFileManager({
  config: {
    provider: 'local',
    local: {
      basePath: './storage',
      allowedExtensions: ['jpg', 'png', 'gif', 'pdf'],
      maxFileSize: 5 * 1024 * 1024 // 5MB
    }
  }
});

UI Components

FileBrowser Component

The FileBrowser is a complete, drop-in file management UI with:

  • Folder tree navigation
  • File list (grid or list view)
  • Breadcrumb navigation
  • File preview (images, text, PDFs)
  • Context menus and actions
  • Upload, download, rename, delete operations
  • Drag-and-drop file moving between folders
import { FileBrowser } from 'hazo_files/ui';

<FileBrowser
  api={api}
  initialPath="/"
  showPreview={true}
  showTree={true}
  viewMode="grid"
  treeWidth={250}
  previewHeight={300}
  onError={(error) => console.error(error)}
  onNavigate={(path) => console.log('Navigated to:', path)}
  onSelect={(item) => console.log('Selected:', item)}
/>

Drag-and-Drop File Moving

The FileBrowser includes built-in drag-and-drop functionality for moving files and folders:

Features:

  • Drag files/folders from the file list
  • Drop onto folders in the sidebar tree or main file list
  • Visual feedback with opacity and colored borders during drag
  • Prevents invalid operations (dropping on self, into current parent, folder into descendant)
  • Shows dragged item preview during drag operation

How to use:

  1. Click and hold on any file or folder in the file list
  2. Drag it over a folder in either the tree sidebar or file list
  3. Valid drop targets show a green ring/background
  4. Release to move the item to the new location

Technical requirements:

  • Requires @dnd-kit/core peer dependency (already included for NamingRuleConfigurator)
  • API must implement moveItem(sourcePath, destinationPath) method
  • Automatically validates drop targets to prevent invalid moves

Visual feedback:

  • Dragging: Item becomes semi-transparent (opacity-50)
  • Valid drop target: Green ring (ring-2 ring-green-500) and background (bg-green-50)
  • Drag preview: Shows file/folder icon and name following cursor

ID patterns used:

  • File items: file-item-{path} (draggable)
  • Folder tree drops: folder-drop-tree-{path} (droppable)
  • Folder list drops: folder-drop-list-{path} (droppable)

Individual Components

You can also use individual components:

import {
  PathBreadcrumb,
  FolderTree,
  FileList,
  FilePreview,
  FileActions,
  FileInfoPanel
} from 'hazo_files/ui';

// Use individually with your own layout

FileInfoPanel Component

The FileInfoPanel displays file metadata in a structured format and can be used standalone in sidebars, custom dialogs, or inline panels:

import { FileInfoPanel } from 'hazo_files/ui';

// In a sidebar
function Sidebar({ selectedFile, metadata, isLoading }) {
  return (
    <div className="sidebar p-4">
      <h3 className="font-bold mb-4">File Info</h3>
      <FileInfoPanel
        item={selectedFile}
        metadata={metadata}
        isLoading={isLoading}
      />
    </div>
  );
}

// Without custom metadata section
<FileInfoPanel
  item={file}
  showCustomMetadata={false}
  className="bg-gray-50 rounded-lg p-4"
/>

// In a custom dialog
function MyCustomDialog({ file }) {
  return (
    <dialog>
      <FileInfoPanel item={file} showCustomMetadata={false} />
    </dialog>
  );
}

Props:

  • item: FileSystemItem | null - The file or folder to display info for
  • metadata?: FileMetadata | null - Additional metadata from database
  • isLoading?: boolean - Show loading state for custom metadata
  • showCustomMetadata?: boolean - Whether to show the JSON metadata section (default: true)
  • className?: string - Additional CSS classes for custom styling

Hooks

import { useFileBrowser, useFileOperations } from 'hazo_files/ui';

function MyCustomFileBrowser() {
  const {
    currentPath,
    files,
    tree,
    selectedItem,
    isLoading,
    navigate,
    refresh,
    selectItem
  } = useFileBrowser(api, '/');

  const {
    createFolder,
    uploadFiles,
    deleteItem,
    renameItem
  } = useFileOperations(api, currentPath);

  // Build your custom UI
}

Naming Rule Configurator

Build consistent file/folder naming patterns with a visual drag-and-drop interface:

import { NamingRuleConfigurator } from 'hazo_files/ui';
import type { NamingVariable } from 'hazo_files/ui';

function NamingConfig() {
  // Define user-specific variables
  const userVariables: NamingVariable[] = [
    {
      variable_name: 'project_name',
      description: 'Name of the project',
      example_value: 'WebApp',
      category: 'user'
    },
    {
      variable_name: 'client_id',
      description: 'Client identifier',
      example_value: 'ACME',
      category: 'user'
    },
  ];

  const handleSchemaChange = (schema) => {
    console.log('New schema:', schema);
    // Save to database or state
  };

  const handleExport = (schema) => {
    // Export as JSON file
    const blob = new Blob([JSON.stringify(schema, null, 2)], { type: 'application/json' });
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'naming-rule.json';
    a.click();
  };

  return (
    <NamingRuleConfigurator
      variables={userVariables}
      onChange={handleSchemaChange}
      onExport={handleExport}
      sampleFileName="proposal.pdf"
    />
  );
}

The configurator provides:

  • Category Tabs: User, Date, File, Counter variables
  • Drag & Drop: Build patterns by dragging variables into file/folder patterns
  • Segment Reordering: Drag segments within patterns to reorder them
  • Live Preview: See generated names in real-time with example values
  • Undo/Redo: Full history with keyboard shortcuts (Ctrl+Z, Ctrl+Y)
  • Import/Export: Save and load naming rules as JSON
  • Scrollable Layout: Works in fixed-height containers with scrollable content area

System variables included:

  • Date: YYYY, YY, MM, DD, YYYY-MM-DD, MMM, MMMM, etc.
  • File: original_name, extension, ext
  • Counter: counter (auto-incrementing with padding)

Naming Convention Management Components

Full UI for managing naming conventions stored in the database:

import {
  NamingConventionManager,
  NamingConventionList,
  NamingConventionEditor,
} from 'hazo_files/ui';

// Full management UI (list + editor combined)
<NamingConventionManager
  api={namingAPI}
  onSelect={(convention) => applyConvention(convention)}
/>

// Or use components separately
<NamingConventionList
  api={namingAPI}
  selectedId={selectedId}
  onSelect={setSelectedId}
  onEdit={(id) => openEditor(id)}
  onDelete={(id) => confirmDelete(id)}
/>

<NamingConventionEditor
  api={namingAPI}
  conventionId={editingId}
  userVariables={customVariables}
  onSave={(convention) => handleSave(convention)}
  onCancel={() => closeEditor()}
/>

Naming Rules API

Generate file and folder names programmatically from naming schemas:

import {
  hazo_files_generate_file_name,
  hazo_files_generate_folder_name,
  createVariableSegment,
  createLiteralSegment,
  type NamingRuleSchema
} from 'hazo_files';

// Create a naming schema
const schema: NamingRuleSchema = {
  version: 1,
  filePattern: [
    createVariableSegment('client_id'),
    createLiteralSegment('_'),
    createVariableSegment('project_name'),
    createLiteralSegment('_'),
    createVariableSegment('YYYY-MM-DD'),
    createLiteralSegment('_'),
    createVariableSegment('counter'),
  ],
  folderPattern: [
    createVariableSegment('YYYY'),
    createLiteralSegment('/'),
    createVariableSegment('client_id'),
    createLiteralSegment('/'),
    createVariableSegment('project_name'),
  ],
};

// Define variable values
const variables = {
  client_id: 'ACME',
  project_name: 'Website',
};

// Generate file name
const fileResult = hazo_files_generate_file_name(
  schema,
  variables,
  'original-document.pdf',
  {
    counterValue: 42,
    preserveExtension: true,  // Keep original .pdf extension
    date: new Date('2024-12-09'),
  }
);

if (fileResult.success) {
  console.log(fileResult.name);
  // Output: "ACME_Website_2024-12-09_042.pdf"
}

// Generate folder path
const folderResult = hazo_files_generate_folder_name(schema, variables);

if (folderResult.success) {
  console.log(folderResult.name);
  // Output: "2024/ACME/Website"
}

// Use with FileManager
const uploadPath = `/${folderResult.name}/${fileResult.name}`;
await fileManager.uploadFile(buffer, uploadPath);

Available System Variables

Date Variables (use current date unless overridden):

  • YYYY - Full year (2024)
  • YY - Two-digit year (24)
  • MM - Month with zero padding (01-12)
  • M - Month without padding (1-12)
  • DD - Day with zero padding (01-31)
  • D - Day without padding (1-31)
  • MMM - Short month name (Jan, Feb, etc.)
  • MMMM - Full month name (January, February, etc.)
  • YYYY-MM-DD - ISO date format (2024-01-15)
  • YYYY-MMM-DD - Date with month name (2024-Jan-15)
  • DD-MM-YYYY - European format (15-01-2024)
  • MM-DD-YYYY - US format (01-15-2024)

File Metadata Variables (from original filename):

  • original_name - Filename without extension
  • extension - File extension with dot (.pdf)
  • ext - Extension without dot (pdf)

Counter Variable:

  • counter - Auto-incrementing number with zero padding (001, 042, 123)

Parsing Pattern Strings

You can also parse pattern strings directly:

import { parsePatternString, patternToString } from 'hazo_files';

// Parse string to segments
const segments = parsePatternString('{client_id}_{YYYY-MM-DD}_{counter}');
console.log(segments);
// [
//   { id: '...', type: 'variable', value: 'client_id' },
//   { id: '...', type: 'literal', value: '_' },
//   { id: '...', type: 'variable', value: 'YYYY-MM-DD' },
//   { id: '...', type: 'literal', value: '_' },
//   { id: '...', type: 'variable', value: 'counter' },
// ]

// Convert back to string
const patternStr = patternToString(segments);
// "{client_id}_{YYYY-MM-DD}_{counter}"

Extraction Data Management

Manage LLM-extracted data stored within the file_data JSON field. The system maintains both raw extraction history and merged results.

Data Structure

interface FileDataStructure {
  merged_data: Record<string, unknown>;  // Combined data from all extractions
  raw_data: ExtractionData[];            // Individual extraction entries
}

interface ExtractionData {
  id: string;           // Unique extraction ID
  extracted_at: string; // ISO timestamp
  source?: string;      // Optional source identifier (e.g., model name)
  data: Record<string, unknown>;  // The extracted data
}

Using with FileMetadataService

import { FileMetadataService, createFileMetadataService } from 'hazo_files';

// Create service with your CRUD provider
const metadataService = createFileMetadataService(crudService);

// Add an extraction
const extraction = await metadataService.addExtraction(
  '/documents/report.pdf',
  'local',
  { title: 'Annual Report', author: 'John Doe', pages: 42 },
  { source: 'gpt-4', mergeStrategy: 'shallow' }
);
console.log('Added extraction:', extraction?.id);

// Get merged data (combined from all extractions)
const merged = await metadataService.getMergedData('/documents/report.pdf', 'local');
console.log('Merged data:', merged);

// Get all extractions
const extractions = await metadataService.getExtractions('/documents/report.pdf', 'local');
console.log('All extractions:', extractions);

// Get a specific extraction
const specific = await metadataService.getExtractionById(
  '/documents/report.pdf',
  'local',
  extraction?.id
);

// Remove an extraction (recalculates merged_data by default)
await metadataService.removeExtractionById(
  '/documents/report.pdf',
  'local',
  extraction?.id,
  { recalculateMerged: true, mergeStrategy: 'deep' }
);

// Clear all extractions
await metadataService.clearExtractions('/documents/report.pdf', 'local');

Using Utility Functions Directly

For working with parsed data structures without database operations:

import {
  parseFileData,
  addExtractionToFileData,
  removeExtractionById,
  getMergedData,
  getExtractions,
  deepMerge,
  createEmptyFileDataStructure,
} from 'hazo_files';

// Parse existing JSON (auto-migrates old format)
const fileData = parseFileData(existingJsonString);

// Add an extraction (returns new structure, immutable)
const result = addExtractionToFileData(
  fileData,
  { category: 'finance', summary: 'Q4 results' },
  { source: 'claude-3', mergeStrategy: 'deep' }
);

if (result.success) {
  const newFileData = result.data;
  console.log('New merged data:', newFileData.merged_data);
  console.log('Extraction count:', newFileData.raw_data.length);
}

// Remove an extraction by ID
const removeResult = removeExtractionById(fileData, 'ext_12345', {
  recalculateMerged: true,
  mergeStrategy: 'shallow'
});

// Get copies of data
const mergedCopy = getMergedData(fileData);
const extractionsCopy = getExtractions(fileData);

Merge Strategies

  • Shallow (default): Spreads top-level properties, later values overwrite earlier

    // { a: 1, b: 2 } + { b: 3, c: 4 } = { a: 1, b: 3, c: 4 }
  • Deep: Recursively merges nested objects, concatenates arrays

    // { a: { x: 1 }, arr: [1] } + { a: { y: 2 }, arr: [2] } = { a: { x: 1, y: 2 }, arr: [1, 2] }

Migration from Old Format

The parseFileData function automatically migrates old plain-object format to the new structure:

// Old format: { title: 'Report', author: 'John' }
// Becomes: { merged_data: { title: 'Report', author: 'John' }, raw_data: [] }

Naming Convention Management

Store and manage naming conventions in your database with full CRUD operations.

NamingConventionService

import { NamingConventionService, HAZO_FILES_NAMING_TABLE_SCHEMA } from 'hazo_files';
import { createCrudService } from 'hazo_connect/server';

// Create CRUD service for naming conventions table
const namingCrud = createCrudService(adapter, HAZO_FILES_NAMING_TABLE_SCHEMA.tableName);
const namingService = new NamingConventionService(namingCrud);

// Create a naming convention
const convention = await namingService.create({
  naming_title: 'Tax Documents',
  naming_type: 'both', // 'file', 'folder', or 'both'
  naming_value: {
    version: 1,
    filePattern: [
      { id: '1', type: 'variable', value: 'client_id' },
      { id: '2', type: 'literal', value: '_' },
      { id: '3', type: 'variable', value: 'YYYY-MM-DD' },
    ],
    folderPattern: [
      { id: '4', type: 'variable', value: 'YYYY' },
      { id: '5', type: 'literal', value: '/' },
      { id: '6', type: 'variable', value: 'client_id' },
    ],
  },
  variables: [
    { variable_name: 'client_id', description: 'Client ID', example_value: 'ACME', category: 'user' }
  ],
  scope_id: 'optional-scope-uuid', // Link to hazo_scopes for organization
});

// Get all conventions
const allConventions = await namingService.list();

// Get parsed conventions (with schema and variables as objects)
const parsed = await namingService.listParsed();

// Get by scope (e.g., for a specific organization)
const scopedConventions = await namingService.getByScope('scope-uuid');

// Update
await namingService.update(convention.id, {
  naming_title: 'Updated Tax Documents',
});

// Duplicate
const copy = await namingService.duplicate(convention.id, 'Tax Documents Copy');

// Delete
await namingService.delete(convention.id);

NamingConventionManager UI Component

import { NamingConventionManager } from 'hazo_files/ui';
import type { NamingConventionAPI } from 'hazo_files/ui';

// Create API adapter for your backend
const namingAPI: NamingConventionAPI = {
  list: () => fetch('/api/naming-conventions').then(r => r.json()),
  create: (input) => fetch('/api/naming-conventions', {
    method: 'POST',
    body: JSON.stringify(input),
  }).then(r => r.json()),
  update: (id, input) => fetch(`/api/naming-conventions/${id}`, {
    method: 'PATCH',
    body: JSON.stringify(input),
  }).then(r => r.json()),
  delete: (id) => fetch(`/api/naming-conventions/${id}`, {
    method: 'DELETE',
  }).then(r => r.json()),
};

function NamingConventionsPage() {
  return (
    <NamingConventionManager
      api={namingAPI}
      onSelect={(convention) => console.log('Selected:', convention)}
    />
  );
}

Upload with LLM Extraction

Combine file uploads with automatic LLM extraction and naming convention application.

UploadExtractService

import {
  TrackedFileManager,
  NamingConventionService,
  LLMExtractionService,
  UploadExtractService,
} from 'hazo_files';
import { createLLM } from 'hazo_llm_api';

// Create LLM extraction service
const extractionService = new LLMExtractionService((provider, options) => {
  return createLLM({ provider, ...options });
}, 'gemini');

// Create upload + extract service (with optional content tag config)
const uploadExtract = new UploadExtractService(
  trackedFileManager,
  namingService,
  extractionService,
  {
    content_tag_set_by_llm: true,
    content_tag_prompt_area: 'classification',
    content_tag_prompt_key: 'classify_document',
    content_tag_prompt_return_fieldname: 'document_type',
  }
);

// Upload with extraction and naming convention
const result = await uploadExtract.uploadWithExtract(
  pdfBuffer,
  'quarterly-report.pdf',
  {
    // Enable LLM extraction
    extract: true,
    extractionOptions: {
      promptArea: 'reports',
      promptKey: 'extract_summary',
      llmProvider: 'gemini',
    },
    // Apply naming convention
    namingConventionId: 'convention-uuid',
    namingVariables: { client_id: 'ACME', project: 'Q4' },
    basePath: '/documents',
    createFolders: true,
    counterValue: 1,
  }
);

if (result.success) {
  console.log('Uploaded to:', result.generatedPath);
  // e.g., '/documents/2024/ACME/ACME_Q4_2024-12-09_001.pdf'
  console.log('Extracted data:', result.extraction?.data);
  console.log('Content tag:', result.contentTag);
  // e.g., 'invoice', 'report', 'contract'
}

// Generate path preview without uploading
const preview = await uploadExtract.generatePath(
  'document.pdf',
  'convention-uuid',
  { client_id: 'ACME' },
  { basePath: '/docs', counterValue: 5 }
);
console.log('Would upload to:', preview.fullPath);

// Create folder from naming convention
const folderResult = await uploadExtract.createFolderFromConvention(
  'convention-uuid',
  { client_id: 'ACME', project: 'Website' },
  { basePath: '/projects' }
);

LLMExtractionService Standalone

import { LLMExtractionService } from 'hazo_files';

const extractionService = new LLMExtractionService(llmFactory, 'gemini');

// Extract from document
const result = await extractionService.extractFromDocument(
  pdfBuffer,
  'application/pdf',
  {
    customPrompt: 'Extract all financial figures and dates',
    llmProvider: 'qwen',
  }
);

// Extract from image
const imageResult = await extractionService.extractFromImage(
  imageBuffer,
  'image/jpeg',
  {
    promptArea: 'receipts',
    promptKey: 'extract_receipt',
  }
);

// Auto-detect based on MIME type
const autoResult = await extractionService.extract(
  buffer,
  mimeType,
  extractionOptions
);

Content Tagging

Automatically classify uploaded files using LLM-based content analysis. The content_tag field stores a classification string (e.g., "invoice", "report", "contract") determined by an LLM prompt.

Configuration

import type { ContentTagConfig } from 'hazo_files';

const contentTagConfig: ContentTagConfig = {
  content_tag_set_by_llm: true,
  content_tag_prompt_area: 'classification',
  content_tag_prompt_key: 'classify_document',
  content_tag_prompt_return_fieldname: 'document_type',
  content_tag_prompt_variables: { language: 'en' }, // optional
};

Automatic Tagging at Upload

Pass contentTagConfig to UploadExtractService constructor (default for all uploads) or per-upload via options:

// Per-upload override
const result = await uploadExtract.uploadWithExtract(buffer, 'file.pdf', {
  basePath: '/docs',
  contentTagConfig: {
    content_tag_set_by_llm: true,
    content_tag_prompt_area: 'classification',
    content_tag_prompt_key: 'classify_document',
    content_tag_prompt_return_fieldname: 'document_type',
  },
});
console.log(result.contentTag); // e.g., 'invoice'

Manual Tagging

Tag existing files by their database record ID:

const tagResult = await uploadExtract.tagFileContent('file-record-id');
if (tagResult.success) {
  console.log('Tagged as:', tagResult.data);
}

V3 Database Migration

If you have an existing hazo_files table, run the V3 migration to add the content_tag column:

import { migrateToV3, HAZO_FILES_MIGRATION_V3 } from 'hazo_files';

// Using the migration helper
await migrateToV3(
  { run: (sql) => db.run(sql) },
  'sqlite'
);

// Or run statements manually
for (const stmt of HAZO_FILES_MIGRATION_V3.sqlite.alterStatements) {
  try { await db.run(stmt); } catch { /* column exists */ }
}

New tables created with HAZO_FILES_TABLE_SCHEMA already include the content_tag column.

File Reference Tracking

Track which entities (form fields, chat messages, etc.) reference each file. Multiple entities can reference the same file, enabling shared files without duplication.

Adding and Removing References

import { TrackedFileManager } from 'hazo_files';

// Upload a file with an initial reference
const result = await trackedManager.uploadFileWithRef(buffer, '/docs/report.pdf', {
  scope_id: 'workspace-123',
  uploaded_by: 'user-456',
  ref: {
    entity_type: 'form_field',
    entity_id: 'field-789',
    created_by: 'user-456',
  },
});
// result.data.file_id, result.data.ref_id

// Add another reference to the same file
await trackedManager.addRef(fileId, {
  entity_type: 'chat_message',
  entity_id: 'msg-abc',
});

// Remove a specific reference
const { remaining_refs } = await trackedManager.removeRef(fileId, refId);

// Get file with status info
const fileStatus = await trackedManager.getFileById(fileId);
// { record, refs: FileRef[], is_orphaned: boolean }

Orphan Detection and Cleanup

// Find files with zero references
const orphans = await trackedManager.findOrphanedFiles({
  olderThanMs: 7 * 24 * 60 * 60 * 1000, // 7 days old
  scope_id: 'workspace-123',
});

// Clean up orphaned files (delete physical files + DB records)
const { cleaned, errors } = await trackedManager.cleanupOrphanedFiles({
  olderThanMs: 30 * 24 * 60 * 60 * 1000,
  softDeleteOnly: false, // true to only mark as soft_deleted
});

// Soft-delete a specific file
await trackedManager.softDeleteFile(fileId);

// Verify physical file existence
const exists = await trackedManager.verifyFileExistence(fileId);

Database Migration (Existing Databases)

If you have an existing hazo_files table, run the V2 migration to add reference tracking columns:

import { migrateToV2, backfillV2Defaults, HAZO_FILES_MIGRATION_V2 } from 'hazo_files';

// Using the migration helper
await migrateToV2(
  { run: (sql) => db.exec(sql) }, // SQLite
  'sqlite'
);
await backfillV2Defaults({ run: (sql) => db.exec(sql) }, 'sqlite');

// Or run statements manually
for (const stmt of HAZO_FILES_MIGRATION_V2.sqlite.alterStatements) {
  try { await db.run(stmt); } catch { /* column exists */ }
}
for (const idx of HAZO_FILES_MIGRATION_V2.sqlite.indexes) {
  await db.run(idx);
}

New tables created with HAZO_FILES_TABLE_SCHEMA already include V2 columns. For V3 content tagging migration, see Content Tagging above.

Reference Tracking Types

import type {
  FileRef,           // Individual reference from entity to file
  FileMetadataRecordV2,  // Extended record with refs, status, scope
  FileWithStatus,    // Rich view: record + parsed refs + is_orphaned
  FileStatus,        // 'active' | 'orphaned' | 'soft_deleted' | 'missing'
  AddRefOptions,     // Options for adding a reference
  RemoveRefsCriteria, // Criteria for bulk ref removal
} from 'hazo_files';

File Change Detection

Detect file content changes using fast xxHash hashing.

import { TrackedFileManager, computeFileHash, hasFileContentChanged } from 'hazo_files';

// TrackedFileManager automatically tracks file hashes on upload
const result = await trackedManager.uploadFile(buffer, '/docs/report.pdf', {
  skipHash: false, // Hash is computed by default
  awaitRecording: true, // Wait for DB record before returning
});

// Check if a file has changed since it was tracked
const hasChanged = await trackedManager.hasFileChanged('/docs/report.pdf');
if (hasChanged) {
  console.log('File has been modified since last upload');
}

// Get stored hash and size
const hash = await trackedManager.getStoredHash('/docs/report.pdf');
const size = await trackedManager.getStoredSize('/docs/report.pdf');

// Use hash utilities directly
const fileHash = await computeFileHash(buffer);
const changed = await hasFileContentChanged(oldHash, newBuffer);

Server Entry Point

For server-side applications, use the /server entry point which includes a factory function:

import { createHazoFilesServer } from 'hazo_files/server';

const hazoFiles = await createHazoFilesServer({
  crudService: fileCrud,
  namingCrudService: namingCrud,
  config: {
    provider: 'local',
    local: { basePath: './storage' },
  },
  enableTracking: true,
  llmFactory: (provider) => createLLM({ provider }),
  // Optional: enable automatic content tagging for all uploads
  defaultContentTagConfig: {
    content_tag_set_by_llm: true,
    content_tag_prompt_area: 'classification',
    content_tag_prompt_key: 'classify_document',
    content_tag_prompt_return_fieldname: 'document_type',
  },
});

// Access all services
const { fileManager, metadataService, namingService, extractionService, uploadExtractService } = hazoFiles;

API Reference

FileManager

Main service class providing unified file operations.

Methods

  • initialize(config?: HazoFilesConfig): Promise<void> - Initialize the file manager
  • createDirectory(path: string): Promise<OperationResult<FolderItem>> - Create directory
  • removeDirectory(path: string, recursive?: boolean): Promise<OperationResult> - Remove directory
  • uploadFile(source, remotePath, options?): Promise<OperationResult<FileItem>> - Upload file
  • downloadFile(remotePath, localPath?, options?): Promise<OperationResult<Buffer | string>> - Download file
  • moveItem(sourcePath, destinationPath, options?): Promise<OperationResult<FileSystemItem>> - Move file/folder
  • deleteFile(path: string): Promise<OperationResult> - Delete file
  • renameFile(path, newName, options?): Promise<OperationResult<FileItem>> - Rename file
  • renameFolder(path, newName, options?): Promise<OperationResult<FolderItem>> - Rename folder
  • listDirectory(path, options?): Promise<OperationResult<FileSystemItem[]>> - List directory contents
  • getItem(path: string): Promise<OperationResult<FileSystemItem>> - Get file/folder info
  • exists(path: string): Promise<boolean> - Check if file/folder exists
  • getFolderTree(path?, depth?): Promise<OperationResult<TreeNode[]>> - Get folder tree
  • writeFile(path, content, options?): Promise<OperationResult<FileItem>> - Write text file
  • readFile(path: string): Promise<OperationResult<string>> - Read text file
  • copyFile(sourcePath, destinationPath, options?): Promise<OperationResult<FileItem>> - Copy file
  • ensureDirectory(path: string): Promise<OperationResult<FolderItem>> - Ensure directory exists

Types

type StorageProvider = 'local' | 'google_drive';

interface FileItem {
  id: string;
  name: string;
  path: string;
  size: number;
  mimeType: string;
  createdAt: Date;
  modifiedAt: Date;
  isDirectory: false;
  parentId?: string;
  metadata?: Record<string, unknown>;
}

interface FolderItem {
  id: string;
  name: string;
  path: string;
  createdAt: Date;
  modifiedAt: Date;
  isDirectory: true;
  parentId?: string;
  children?: (FileItem | FolderItem)[];
  metadata?: Record<string, unknown>;
}

interface OperationResult<T = void> {
  success: boolean;
  data?: T;
  error?: string;
}

interface UploadOptions {
  overwrite?: boolean;
  onProgress?: (progress: number, bytesTransferred: number, totalBytes: number) => void;
  metadata?: Record<string, unknown>;
}

See src/types/index.ts for complete type definitions.

Error Handling

hazo_files provides comprehensive error types:

import {
  FileNotFoundError,
  DirectoryNotFoundError,
  FileExistsError,
  DirectoryExistsError,
  DirectoryNotEmptyError,
  PermissionDeniedError,
  InvalidPathError,
  FileTooLargeError,
  InvalidExtensionError,
  AuthenticationError,
  ConfigurationError,
  OperationError
} from 'hazo_files';

// Use in try-catch
try {
  await fileManager.uploadFile(buffer, '/files/test.exe');
} catch (error) {
  if (error instanceof InvalidExtensionError) {
    console.error('File type not allowed');
  } else if (error instanceof FileTooLargeError) {
    console.error('File is too large');
  }
}

Extending with Custom Storage Providers

See docs/ADDING_MODULES.md for a complete guide on creating custom storage modules.

Quick example:

import { BaseStorageModule } from 'hazo_files';
import type { StorageProvider, OperationResult, FileItem } from 'hazo_files';

class S3StorageModule extends BaseStorageModule {
  readonly provider: StorageProvider = 's3' as StorageProvider;

  async initialize(config: HazoFilesConfig): Promise<void> {
    await super.initialize(config);
    // Initialize S3 client
  }

  async uploadFile(source, remotePath, options?): Promise<OperationResult<FileItem>> {
    // Implement S3 upload
  }

  // Implement other required methods...
}

// Register the module
import { registerModule } from 'hazo_files';
registerModule('s3', () => new S3StorageModule());

Testing

The package includes a test application in test-app/ demonstrating:

  • Next.js 14+ integration
  • API routes for file operations
  • FileBrowser UI component usage
  • Local storage and Google Drive switching
  • OAuth flow implementation

To run the test app:

cd test-app
npm install
npm run dev

Visit http://localhost:3000

Browser Compatibility

The UI components require:

  • Modern browsers with ES2020+ support
  • React 18+
  • CSS Grid and Flexbox support

Server-side code requires Node.js 16+

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes with clear messages
  4. Add tests for new functionality
  5. Submit a pull request

Support

Roadmap

  • Amazon S3 storage module
  • Dropbox storage module
  • OneDrive storage module
  • WebDAV support
  • Advanced search and filtering
  • Batch operations
  • File versioning
  • Sharing and permissions
  • Real-time file sync
  • Thumbnail generation

Credits

Created by Pubs Abayasiri

Built with:

  • TypeScript
  • React
  • Google APIs (googleapis)
  • xxhash-wasm for fast file hashing
  • @dnd-kit for drag-and-drop
  • tsup for building