hazo_files

v1.4.3

Published

10 days ago

File management including integration to cloud files

0High
0Medium
0Low

pubs12

file management google-drive local-storage file-browser

hazo_files

A powerful, modular file management package for Node.js and React applications with support for local filesystem and Google Drive storage. Built with TypeScript for type safety and developer experience.

Features

Multiple Storage Providers: Local filesystem and Google Drive support out of the box
Modular Architecture: Easily add custom storage providers
Unified API: Single consistent interface across all storage providers
React UI Components: Drop-in FileBrowser component with folder tree, file list, and preview
Naming Rules System: Visual configurator and utilities for generating consistent file/folder names
Naming Convention Management: Full CRUD with UI components for managing naming conventions in database
Extraction Data Management: Track and manage LLM-extracted metadata with merge strategies
LLM Integration: Built-in support for hazo_llm_api document/image extraction
Upload + Extract Workflow: Combined service for uploading files with automatic LLM extraction and naming
File Reference Tracking: Multi-entity file references with orphan detection, soft delete, and lifecycle management
File Change Detection: xxHash-based content hashing for efficient change detection
Content Tagging: Optional LLM-based content classification at upload time or on-demand via content_tag field
Schema Migrations: Built-in V2/V3 migration utilities for adding reference tracking and content tagging to existing databases
TypeScript: Full type safety and IntelliSense support
OAuth Integration: Built-in Google Drive OAuth authentication
Progress Tracking: Upload/download progress callbacks
File Validation: Extension filtering and file size limits
Error Handling: Comprehensive error types and handling

Installation

npm install hazo_files

For React UI components, ensure you have React 18+ installed:

npm install react react-dom

For the NamingRuleConfigurator component (drag-and-drop interface), also install:

npm install @dnd-kit/core @dnd-kit/sortable @dnd-kit/utilities

For database tracking and LLM extraction features (optional):

npm install hazo_connect      # Database tracking
npm install hazo_llm_api      # LLM document extraction
npm install server-only       # Server-side safety (recommended)
# Note: xxhash-wasm is included automatically as a dependency

Tailwind CSS v4 Setup (Required for UI Components)

If you're using Tailwind CSS v4 with the UI components, you must add a @source directive to your CSS file to ensure Tailwind scans the package's files for utility classes.

Add this to your globals.css or main CSS file AFTER the tailwindcss import:

@import "tailwindcss";

/* Required: Enable Tailwind to scan hazo_files package for utility classes */
@source "../node_modules/hazo_files/dist/ui";

Without this directive, Tailwind v4's JIT compiler will not generate CSS for the utility classes used in hazo_files components (like hover:bg-gray-100, text-sm, rounded-md, etc.), resulting in broken styling.

Note: This is only required for Tailwind v4. Earlier versions of Tailwind automatically scan node_modules and do not need this configuration.

Quick Start

Basic Usage (Server-side)

import { createInitializedFileManager } from 'hazo_files';

// Create and initialize file manager
const fileManager = await createInitializedFileManager({
  config: {
    provider: 'local',
    local: {
      basePath: './files',
      maxFileSize: 10 * 1024 * 1024, // 10MB
      allowedExtensions: ['jpg', 'png', 'pdf', 'txt']
    }
  }
});

// Create a directory
await fileManager.createDirectory('/documents');

// Upload a file
await fileManager.uploadFile(
  './local-file.pdf',
  '/documents/file.pdf',
  {
    onProgress: (progress, bytes, total) => {
      console.log(`Upload progress: ${progress}%`);
    }
  }
);

// List directory contents
const result = await fileManager.listDirectory('/documents');
if (result.success) {
  console.log(result.data);
}

// Download a file
await fileManager.downloadFile('/documents/file.pdf', './downloaded.pdf');

Using Configuration File

Create hazo_files_config.ini in your project root:

[general]
provider = local

[local]
base_path = ./files
max_file_size = 10485760
allowed_extensions = jpg,png,pdf,txt

Then initialize without config object:

import { createInitializedFileManager } from 'hazo_files';

const fileManager = await createInitializedFileManager();

React UI Component

import { FileBrowser } from 'hazo_files/ui';
import type { FileBrowserAPI } from 'hazo_files/ui';

// Create an API adapter that calls your server endpoints
const api: FileBrowserAPI = {
  async listDirectory(path: string) {
    const res = await fetch(`/api/files?action=list&path=${path}`);
    return res.json();
  },
  async getFolderTree(path = '/', depth = 3) {
    const res = await fetch(`/api/files?action=tree&path=${path}&depth=${depth}`);
    return res.json();
  },
  async uploadFile(file: File, remotePath: string) {
    const formData = new FormData();
    formData.append('file', file);
    formData.append('path', remotePath);
    const res = await fetch('/api/files/upload', { method: 'POST', body: formData });
    return res.json();
  },
  // ... implement other methods
};

function MyFileBrowser() {
  return (
    <FileBrowser
      api={api}
      initialPath="/"
      showPreview={true}
      showTree={true}
      viewMode="grid"
    />
  );
}

Advanced Usage

Google Drive Integration

1. Set up Google Cloud Console

Go to Google Cloud Console
Create a new project or select an existing one
Enable the Google Drive API
Create OAuth 2.0 credentials
Add authorized redirect URIs (e.g., http://localhost:3000/api/auth/callback/google)

2. Configure Environment Variables

Create .env.local:

HAZO_GOOGLE_DRIVE_CLIENT_ID=your-client-id.apps.googleusercontent.com
HAZO_GOOGLE_DRIVE_CLIENT_SECRET=your-client-secret
HAZO_GOOGLE_DRIVE_REDIRECT_URI=http://localhost:3000/api/auth/callback/google

3. Configure hazo_files

[general]
provider = google_drive

[google_drive]
client_id =
client_secret =
redirect_uri = http://localhost:3000/api/auth/callback/google
refresh_token =

Environment variables will automatically override empty values.

4. Implement OAuth Flow

import { createFileManager, GoogleDriveModule } from 'hazo_files';

// Initialize with Google Drive
const fileManager = createFileManager({
  config: {
    provider: 'google_drive',
    google_drive: {
      clientId: process.env.HAZO_GOOGLE_DRIVE_CLIENT_ID!,
      clientSecret: process.env.HAZO_GOOGLE_DRIVE_CLIENT_SECRET!,
      redirectUri: process.env.HAZO_GOOGLE_DRIVE_REDIRECT_URI!,
    }
  }
});

await fileManager.initialize();

// Get the Google Drive module to access auth methods
const module = fileManager.getModule() as GoogleDriveModule;
const auth = module.getAuth();

// Generate auth URL
const authUrl = auth.getAuthUrl();
console.log('Visit:', authUrl);

// After user authorizes, exchange code for tokens
const tokens = await auth.exchangeCodeForTokens(authCode);

// Authenticate the module
await module.authenticate(tokens);

// Now you can use the file manager
await fileManager.createDirectory('/MyFolder');

Next.js API Route Example

// app/api/files/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createInitializedFileManager } from 'hazo_files';

async function getFileManager() {
  return createInitializedFileManager({
    config: {
      provider: 'local',
      local: {
        basePath: process.env.LOCAL_STORAGE_BASE_PATH || './files',
      }
    }
  });
}

export async function GET(request: NextRequest) {
  const { searchParams } = new URL(request.url);
  const action = searchParams.get('action');
  const path = searchParams.get('path') || '/';

  const fm = await getFileManager();

  switch (action) {
    case 'list':
      return NextResponse.json(await fm.listDirectory(path));
    case 'tree':
      const depth = parseInt(searchParams.get('depth') || '3', 10);
      return NextResponse.json(await fm.getFolderTree(path, depth));
    default:
      return NextResponse.json({ success: false, error: 'Invalid action' });
  }
}

export async function POST(request: NextRequest) {
  const body = await request.json();
  const { action, ...params } = body;

  const fm = await getFileManager();

  switch (action) {
    case 'createDirectory':
      return NextResponse.json(await fm.createDirectory(params.path));
    case 'deleteFile':
      return NextResponse.json(await fm.deleteFile(params.path));
    case 'renameFile':
      return NextResponse.json(await fm.renameFile(params.path, params.newName));
    default:
      return NextResponse.json({ success: false, error: 'Invalid action' });
  }
}

File Upload API Route

// app/api/files/upload/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createInitializedFileManager } from 'hazo_files';

export async function POST(request: NextRequest) {
  const formData = await request.formData();
  const file = formData.get('file') as File;
  const path = formData.get('path') as string;

  const fm = await getFileManager();

  // Convert File to Buffer
  const arrayBuffer = await file.arrayBuffer();
  const buffer = Buffer.from(arrayBuffer);

  const result = await fm.uploadFile(buffer, path);
  return NextResponse.json(result);
}

Progress Tracking

// Upload with progress tracking
await fileManager.uploadFile(
  './large-file.zip',
  '/uploads/large-file.zip',
  {
    onProgress: (progress, bytesTransferred, totalBytes) => {
      console.log(`Progress: ${progress.toFixed(2)}%`);
      console.log(`${bytesTransferred} / ${totalBytes} bytes`);
    }
  }
);

// Download with progress tracking
await fileManager.downloadFile(
  '/uploads/large-file.zip',
  './downloaded-file.zip',
  {
    onProgress: (progress, bytesTransferred, totalBytes) => {
      console.log(`Download: ${progress.toFixed(2)}%`);
    }
  }
);

File Operations

// Create directory structure
await fileManager.createDirectory('/projects/2024/docs');

// Upload file
const uploadResult = await fileManager.uploadFile(
  buffer,
  '/projects/2024/docs/report.pdf'
);

// Move file
await fileManager.moveItem(
  '/projects/2024/docs/report.pdf',
  '/archive/2024/report.pdf'
);

// Rename file
await fileManager.renameFile(
  '/archive/2024/report.pdf',
  'annual-report.pdf'
);

// Copy file (convenience method)
await fileManager.copyFile(
  '/archive/2024/annual-report.pdf',
  '/backup/annual-report.pdf'
);

// Delete file
await fileManager.deleteFile('/backup/annual-report.pdf');

// Remove directory (recursive)
await fileManager.removeDirectory('/archive/2024', true);

// Check if file exists
const exists = await fileManager.exists('/projects/2024/docs');

// Get file/folder information
const itemResult = await fileManager.getItem('/projects/2024/docs/report.pdf');
if (itemResult.success && itemResult.data) {
  console.log('File:', itemResult.data.name);
  console.log('Size:', itemResult.data.size);
  console.log('Modified:', itemResult.data.modifiedAt);
}

// List directory with options
const listResult = await fileManager.listDirectory('/projects', {
  recursive: true,
  includeHidden: false,
  filter: (item) => !item.isDirectory && item.name.endsWith('.pdf')
});

Working with Text Files

// Write text file
await fileManager.writeFile('/notes/readme.txt', 'Hello, World!');

// Read text file
const readResult = await fileManager.readFile('/notes/readme.txt');
if (readResult.success) {
  console.log(readResult.data); // "Hello, World!"
}

Folder Tree

// Get folder tree (3 levels deep by default)
const treeResult = await fileManager.getFolderTree('/projects', 3);
if (treeResult.success && treeResult.data) {
  console.log(JSON.stringify(treeResult.data, null, 2));
}

// Output:
// [
//   {
//     "id": "abc123",
//     "name": "2024",
//     "path": "/projects/2024",
//     "children": [
//       {
//         "id": "def456",
//         "name": "docs",
//         "path": "/projects/2024/docs",
//         "children": []
//       }
//     ]
//   }
// ]

Configuration

Configuration File (`hazo_files_config.ini`)

[general]
provider = local

[local]
base_path = ./files
allowed_extensions = jpg,png,pdf,txt,doc,docx
max_file_size = 10485760

[google_drive]
client_id = your-client-id.apps.googleusercontent.com
client_secret = your-client-secret
redirect_uri = http://localhost:3000/api/auth/callback/google
refresh_token =
access_token =
root_folder_id =

[naming]
; Supported date format tokens for naming rules
date_formats = YYYY,YY,MM,M,DD,D,MMM,MMMM,YYYY-MM-DD,YYYY-MMM-DD,DD-MM-YYYY,MM-DD-YYYY

Environment Variables

The following environment variables can override configuration file values:

HAZO_GOOGLE_DRIVE_CLIENT_ID
HAZO_GOOGLE_DRIVE_CLIENT_SECRET
HAZO_GOOGLE_DRIVE_REDIRECT_URI
HAZO_GOOGLE_DRIVE_REFRESH_TOKEN
HAZO_GOOGLE_DRIVE_ACCESS_TOKEN
HAZO_GOOGLE_DRIVE_ROOT_FOLDER_ID

Configuration via Code

import { createInitializedFileManager } from 'hazo_files';

const fileManager = await createInitializedFileManager({
  config: {
    provider: 'local',
    local: {
      basePath: './storage',
      allowedExtensions: ['jpg', 'png', 'gif', 'pdf'],
      maxFileSize: 5 * 1024 * 1024 // 5MB
    }
  }
});

UI Components

FileBrowser Component

The FileBrowser is a complete, drop-in file management UI with:

Folder tree navigation
File list (grid or list view)
Breadcrumb navigation
File preview (images, text, PDFs)
Context menus and actions
Upload, download, rename, delete operations
Drag-and-drop file moving between folders

import { FileBrowser } from 'hazo_files/ui';

<FileBrowser
  api={api}
  initialPath="/"
  showPreview={true}
  showTree={true}
  viewMode="grid"
  treeWidth={250}
  previewHeight={300}
  onError={(error) => console.error(error)}
  onNavigate={(path) => console.log('Navigated to:', path)}
  onSelect={(item) => console.log('Selected:', item)}
/>

Drag-and-Drop File Moving

The FileBrowser includes built-in drag-and-drop functionality for moving files and folders:

Features:

Drag files/folders from the file list
Drop onto folders in the sidebar tree or main file list
Visual feedback with opacity and colored borders during drag
Prevents invalid operations (dropping on self, into current parent, folder into descendant)
Shows dragged item preview during drag operation

How to use:

Click and hold on any file or folder in the file list
Drag it over a folder in either the tree sidebar or file list
Valid drop targets show a green ring/background
Release to move the item to the new location

Technical requirements:

Requires @dnd-kit/core peer dependency (already included for NamingRuleConfigurator)
API must implement moveItem(sourcePath, destinationPath) method
Automatically validates drop targets to prevent invalid moves

Visual feedback:

Dragging: Item becomes semi-transparent (opacity-50)
Valid drop target: Green ring (ring-2 ring-green-500) and background (bg-green-50)
Drag preview: Shows file/folder icon and name following cursor

ID patterns used:

File items: file-item-{path} (draggable)
Folder tree drops: folder-drop-tree-{path} (droppable)
Folder list drops: folder-drop-list-{path} (droppable)

Individual Components

You can also use individual components:

import {
  PathBreadcrumb,
  FolderTree,
  FileList,
  FilePreview,
  FileActions,
  FileInfoPanel
} from 'hazo_files/ui';

// Use individually with your own layout

FileInfoPanel Component

The FileInfoPanel displays file metadata in a structured format and can be used standalone in sidebars, custom dialogs, or inline panels:

import { FileInfoPanel } from 'hazo_files/ui';

// In a sidebar
function Sidebar({ selectedFile, metadata, isLoading }) {
  return (
    <div className="sidebar p-4">
      <h3 className="font-bold mb-4">File Info</h3>
      <FileInfoPanel
        item={selectedFile}
        metadata={metadata}
        isLoading={isLoading}
      />
    </div>
  );
}

// Without custom metadata section
<FileInfoPanel
  item={file}
  showCustomMetadata={false}
  className="bg-gray-50 rounded-lg p-4"
/>

// In a custom dialog
function MyCustomDialog({ file }) {
  return (
    <dialog>
      <FileInfoPanel item={file} showCustomMetadata={false} />
    </dialog>
  );
}

Props:

item: FileSystemItem | null - The file or folder to display info for
metadata?: FileMetadata | null - Additional metadata from database
isLoading?: boolean - Show loading state for custom metadata
showCustomMetadata?: boolean - Whether to show the JSON metadata section (default: true)
className?: string - Additional CSS classes for custom styling

Hooks

import { useFileBrowser, useFileOperations } from 'hazo_files/ui';

function MyCustomFileBrowser() {
  const {
    currentPath,
    files,
    tree,
    selectedItem,
    isLoading,
    navigate,
    refresh,
    selectItem
  } = useFileBrowser(api, '/');

  const {
    createFolder,
    uploadFiles,
    deleteItem,
    renameItem
  } = useFileOperations(api, currentPath);

  // Build your custom UI
}

Naming Rule Configurator

Build consistent file/folder naming patterns with a visual drag-and-drop interface:

import { NamingRuleConfigurator } from 'hazo_files/ui';
import type { NamingVariable } from 'hazo_files/ui';

function NamingConfig() {
  // Define user-specific variables
  const userVariables: NamingVariable[] = [
    {
      variable_name: 'project_name',
      description: 'Name of the project',
      example_value: 'WebApp',
      category: 'user'
    },
    {
      variable_name: 'client_id',
      description: 'Client identifier',
      example_value: 'ACME',
      category: 'user'
    },
  ];

  const handleSchemaChange = (schema) => {
    console.log('New schema:', schema);
    // Save to database or state
  };

  const handleExport = (schema) => {
    // Export as JSON file
    const blob = new Blob([JSON.stringify(schema, null, 2)], { type: 'application/json' });
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'naming-rule.json';
    a.click();
  };

  return (
    <NamingRuleConfigurator
      variables={userVariables}
      onChange={handleSchemaChange}
      onExport={handleExport}
      sampleFileName="proposal.pdf"
    />
  );
}

The configurator provides:

Category Tabs: User, Date, File, Counter variables
Drag & Drop: Build patterns by dragging variables into file/folder patterns
Segment Reordering: Drag segments within patterns to reorder them
Live Preview: See generated names in real-time with example values
Undo/Redo: Full history with keyboard shortcuts (Ctrl+Z, Ctrl+Y)
Import/Export: Save and load naming rules as JSON
Scrollable Layout: Works in fixed-height containers with scrollable content area

System variables included:

Date: YYYY, YY, MM, DD, YYYY-MM-DD, MMM, MMMM, etc.
File: original_name, extension, ext
Counter: counter (auto-incrementing with padding)

Naming Convention Management Components

Full UI for managing naming conventions stored in the database:

import {
  NamingConventionManager,
  NamingConventionList,
  NamingConventionEditor,
} from 'hazo_files/ui';

// Full management UI (list + editor combined)
<NamingConventionManager
  api={namingAPI}
  onSelect={(convention) => applyConvention(convention)}
/>

// Or use components separately
<NamingConventionList
  api={namingAPI}
  selectedId={selectedId}
  onSelect={setSelectedId}
  onEdit={(id) => openEditor(id)}
  onDelete={(id) => confirmDelete(id)}
/>

<NamingConventionEditor
  api={namingAPI}
  conventionId={editingId}
  userVariables={customVariables}
  onSave={(convention) => handleSave(convention)}
  onCancel={() => closeEditor()}
/>

Naming Rules API

Generate file and folder names programmatically from naming schemas:

import {
  hazo_files_generate_file_name,
  hazo_files_generate_folder_name,
  createVariableSegment,
  createLiteralSegment,
  type NamingRuleSchema
} from 'hazo_files';

// Create a naming schema
const schema: NamingRuleSchema = {
  version: 1,
  filePattern: [
    createVariableSegment('client_id'),
    createLiteralSegment('_'),
    createVariableSegment('project_name'),
    createLiteralSegment('_'),
    createVariableSegment('YYYY-MM-DD'),
    createLiteralSegment('_'),
    createVariableSegment('counter'),
  ],
  folderPattern: [
    createVariableSegment('YYYY'),
    createLiteralSegment('/'),
    createVariableSegment('client_id'),
    createLiteralSegment('/'),
    createVariableSegment('project_name'),
  ],
};

// Define variable values
const variables = {
  client_id: 'ACME',
  project_name: 'Website',
};

// Generate file name
const fileResult = hazo_files_generate_file_name(
  schema,
  variables,
  'original-document.pdf',
  {
    counterValue: 42,
    preserveExtension: true,  // Keep original .pdf extension
    date: new Date('2024-12-09'),
  }
);

if (fileResult.success) {
  console.log(fileResult.name);
  // Output: "ACME_Website_2024-12-09_042.pdf"
}

// Generate folder path
const folderResult = hazo_files_generate_folder_name(schema, variables);

if (folderResult.success) {
  console.log(folderResult.name);
  // Output: "2024/ACME/Website"
}

// Use with FileManager
const uploadPath = `/${folderResult.name}/${fileResult.name}`;
await fileManager.uploadFile(buffer, uploadPath);

Available System Variables

Date Variables (use current date unless overridden):

YYYY - Full year (2024)
YY - Two-digit year (24)
MM - Month with zero padding (01-12)
M - Month without padding (1-12)
DD - Day with zero padding (01-31)
D - Day without padding (1-31)
MMM - Short month name (Jan, Feb, etc.)
MMMM - Full month name (January, February, etc.)
YYYY-MM-DD - ISO date format (2024-01-15)
YYYY-MMM-DD - Date with month name (2024-Jan-15)
DD-MM-YYYY - European format (15-01-2024)
MM-DD-YYYY - US format (01-15-2024)

File Metadata Variables (from original filename):

original_name - Filename without extension
extension - File extension with dot (.pdf)
ext - Extension without dot (pdf)

Counter Variable:

counter - Auto-incrementing number with zero padding (001, 042, 123)

Parsing Pattern Strings

You can also parse pattern strings directly:

import { parsePatternString, patternToString } from 'hazo_files';

// Parse string to segments
const segments = parsePatternString('{client_id}_{YYYY-MM-DD}_{counter}');
console.log(segments);
// [
//   { id: '...', type: 'variable', value: 'client_id' },
//   { id: '...', type: 'literal', value: '_' },
//   { id: '...', type: 'variable', value: 'YYYY-MM-DD' },
//   { id: '...', type: 'literal', value: '_' },
//   { id: '...', type: 'variable', value: 'counter' },
// ]

// Convert back to string
const patternStr = patternToString(segments);
// "{client_id}_{YYYY-MM-DD}_{counter}"

Extraction Data Management

Manage LLM-extracted data stored within the file_data JSON field. The system maintains both raw extraction history and merged results.

Data Structure

interface FileDataStructure {
  merged_data: Record<string, unknown>;  // Combined data from all extractions
  raw_data: ExtractionData[];            // Individual extraction entries
}

interface ExtractionData {
  id: string;           // Unique extraction ID
  extracted_at: string; // ISO timestamp
  source?: string;      // Optional source identifier (e.g., model name)
  data: Record<string, unknown>;  // The extracted data
}

Using with FileMetadataService

import { FileMetadataService, createFileMetadataService } from 'hazo_files';

// Create service with your CRUD provider
const metadataService = createFileMetadataService(crudService);

// Add an extraction
const extraction = await metadataService.addExtraction(
  '/documents/report.pdf',
  'local',
  { title: 'Annual Report', author: 'John Doe', pages: 42 },
  { source: 'gpt-4', mergeStrategy: 'shallow' }
);
console.log('Added extraction:', extraction?.id);

// Get merged data (combined from all extractions)
const merged = await metadataService.getMergedData('/documents/report.pdf', 'local');
console.log('Merged data:', merged);

// Get all extractions
const extractions = await metadataService.getExtractions('/documents/report.pdf', 'local');
console.log('All extractions:', extractions);

// Get a specific extraction
const specific = await metadataService.getExtractionById(
  '/documents/report.pdf',
  'local',
  extraction?.id
);

// Remove an extraction (recalculates merged_data by default)
await metadataService.removeExtractionById(
  '/documents/report.pdf',
  'local',
  extraction?.id,
  { recalculateMerged: true, mergeStrategy: 'deep' }
);

// Clear all extractions
await metadataService.clearExtractions('/documents/report.pdf', 'local');

Using Utility Functions Directly

For working with parsed data structures without database operations:

import {
  parseFileData,
  addExtractionToFileData,
  removeExtractionById,
  getMergedData,
  getExtractions,
  deepMerge,
  createEmptyFileDataStructure,
} from 'hazo_files';

// Parse existing JSON (auto-migrates old format)
const fileData = parseFileData(existingJsonString);

// Add an extraction (returns new structure, immutable)
const result = addExtractionToFileData(
  fileData,
  { category: 'finance', summary: 'Q4 results' },
  { source: 'claude-3', mergeStrategy: 'deep' }
);

if (result.success) {
  const newFileData = result.data;
  console.log('New merged data:', newFileData.merged_data);
  console.log('Extraction count:', newFileData.raw_data.length);
}

// Remove an extraction by ID
const removeResult = removeExtractionById(fileData, 'ext_12345', {
  recalculateMerged: true,
  mergeStrategy: 'shallow'
});

// Get copies of data
const mergedCopy = getMergedData(fileData);
const extractionsCopy = getExtractions(fileData);

Merge Strategies

Shallow (default): Spreads top-level properties, later values overwrite earlier
```
// { a: 1, b: 2 } + { b: 3, c: 4 } = { a: 1, b: 3, c: 4 }
```

Deep: Recursively merges nested objects, concatenates arrays

// { a: { x: 1 }, arr: [1] } + { a: { y: 2 }, arr: [2] } = { a: { x: 1, y: 2 }, arr: [1, 2] }

Migration from Old Format

The parseFileData function automatically migrates old plain-object format to the new structure:

// Old format: { title: 'Report', author: 'John' }
// Becomes: { merged_data: { title: 'Report', author: 'John' }, raw_data: [] }

Naming Convention Management

Store and manage naming conventions in your database with full CRUD operations.

NamingConventionService

import { NamingConventionService, HAZO_FILES_NAMING_TABLE_SCHEMA } from 'hazo_files';
import { createCrudService } from 'hazo_connect/server';

// Create CRUD service for naming conventions table
const namingCrud = createCrudService(adapter, HAZO_FILES_NAMING_TABLE_SCHEMA.tableName);
const namingService = new NamingConventionService(namingCrud);

// Create a naming convention
const convention = await namingService.create({
  naming_title: 'Tax Documents',
  naming_type: 'both', // 'file', 'folder', or 'both'
  naming_value: {
    version: 1,
    filePattern: [
      { id: '1', type: 'variable', value: 'client_id' },
      { id: '2', type: 'literal', value: '_' },
      { id: '3', type: 'variable', value: 'YYYY-MM-DD' },
    ],
    folderPattern: [
      { id: '4', type: 'variable', value: 'YYYY' },
      { id: '5', type: 'literal', value: '/' },
      { id: '6', type: 'variable', value: 'client_id' },
    ],
  },
  variables: [
    { variable_name: 'client_id', description: 'Client ID', example_value: 'ACME', category: 'user' }
  ],
  scope_id: 'optional-scope-uuid', // Link to hazo_scopes for organization
});

// Get all conventions
const allConventions = await namingService.list();

// Get parsed conventions (with schema and variables as objects)
const parsed = await namingService.listParsed();

// Get by scope (e.g., for a specific organization)
const scopedConventions = await namingService.getByScope('scope-uuid');

// Update
await namingService.update(convention.id, {
  naming_title: 'Updated Tax Documents',
});

// Duplicate
const copy = await namingService.duplicate(convention.id, 'Tax Documents Copy');

// Delete
await namingService.delete(convention.id);

NamingConventionManager UI Component

import { NamingConventionManager } from 'hazo_files/ui';
import type { NamingConventionAPI } from 'hazo_files/ui';

// Create API adapter for your backend
const namingAPI: NamingConventionAPI = {
  list: () => fetch('/api/naming-conventions').then(r => r.json()),
  create: (input) => fetch('/api/naming-conventions', {
    method: 'POST',
    body: JSON.stringify(input),
  }).then(r => r.json()),
  update: (id, input) => fetch(`/api/naming-conventions/${id}`, {
    method: 'PATCH',
    body: JSON.stringify(input),
  }).then(r => r.json()),
  delete: (id) => fetch(`/api/naming-conventions/${id}`, {
    method: 'DELETE',
  }).then(r => r.json()),
};

function NamingConventionsPage() {
  return (
    <NamingConventionManager
      api={namingAPI}
      onSelect={(convention) => console.log('Selected:', convention)}
    />
  );
}

Upload with LLM Extraction

Combine file uploads with automatic LLM extraction and naming convention application.

UploadExtractService

import {
  TrackedFileManager,
  NamingConventionService,
  LLMExtractionService,
  UploadExtractService,
} from 'hazo_files';
import { createLLM } from 'hazo_llm_api';

// Create LLM extraction service
const extractionService = new LLMExtractionService((provider, options) => {
  return createLLM({ provider, ...options });
}, 'gemini');

// Create upload + extract service (with optional content tag config)
const uploadExtract = new UploadExtractService(
  trackedFileManager,
  namingService,
  extractionService,
  {
    content_tag_set_by_llm: true,
    content_tag_prompt_area: 'classification',
    content_tag_prompt_key: 'classify_document',
    content_tag_prompt_return_fieldname: 'document_type',
  }
);

// Upload with extraction and naming convention
const result = await uploadExtract.uploadWithExtract(
  pdfBuffer,
  'quarterly-report.pdf',
  {
    // Enable LLM extraction
    extract: true,
    extractionOptions: {
      promptArea: 'reports',
      promptKey: 'extract_summary',
      llmProvider: 'gemini',
    },
    // Apply naming convention
    namingConventionId: 'convention-uuid',
    namingVariables: { client_id: 'ACME', project: 'Q4' },
    basePath: '/documents',
    createFolders: true,
    counterValue: 1,
  }
);

if (result.success) {
  console.log('Uploaded to:', result.generatedPath);
  // e.g., '/documents/2024/ACME/ACME_Q4_2024-12-09_001.pdf'
  console.log('Extracted data:', result.extraction?.data);
  console.log('Content tag:', result.contentTag);
  // e.g., 'invoice', 'report', 'contract'
}

// Generate path preview without uploading
const preview = await uploadExtract.generatePath(
  'document.pdf',
  'convention-uuid',
  { client_id: 'ACME' },
  { basePath: '/docs', counterValue: 5 }
);
console.log('Would upload to:', preview.fullPath);

// Create folder from naming convention
const folderResult = await uploadExtract.createFolderFromConvention(
  'convention-uuid',
  { client_id: 'ACME', project: 'Website' },
  { basePath: '/projects' }
);

LLMExtractionService Standalone

import { LLMExtractionService } from 'hazo_files';

const extractionService = new LLMExtractionService(llmFactory, 'gemini');

// Extract from document
const result = await extractionService.extractFromDocument(
  pdfBuffer,
  'application/pdf',
  {
    customPrompt: 'Extract all financial figures and dates',
    llmProvider: 'qwen',
  }
);

// Extract from image
const imageResult = await extractionService.extractFromImage(
  imageBuffer,
  'image/jpeg',
  {
    promptArea: 'receipts',
    promptKey: 'extract_receipt',
  }
);

// Auto-detect based on MIME type
const autoResult = await extractionService.extract(
  buffer,
  mimeType,
  extractionOptions
);

Content Tagging

Automatically classify uploaded files using LLM-based content analysis. The content_tag field stores a classification string (e.g., "invoice", "report", "contract") determined by an LLM prompt.

Configuration

import type { ContentTagConfig } from 'hazo_files';

const contentTagConfig: ContentTagConfig = {
  content_tag_set_by_llm: true,
  content_tag_prompt_area: 'classification',
  content_tag_prompt_key: 'classify_document',
  content_tag_prompt_return_fieldname: 'document_type',
  content_tag_prompt_variables: { language: 'en' }, // optional
};

Automatic Tagging at Upload

Pass contentTagConfig to UploadExtractService constructor (default for all uploads) or per-upload via options:

// Per-upload override
const result = await uploadExtract.uploadWithExtract(buffer, 'file.pdf', {
  basePath: '/docs',
  contentTagConfig: {
    content_tag_set_by_llm: true,
    content_tag_prompt_area: 'classification',
    content_tag_prompt_key: 'classify_document',
    content_tag_prompt_return_fieldname: 'document_type',
  },
});
console.log(result.contentTag); // e.g., 'invoice'

Manual Tagging

Tag existing files by their database record ID:

const tagResult = await uploadExtract.tagFileContent('file-record-id');
if (tagResult.success) {
  console.log('Tagged as:', tagResult.data);
}

V3 Database Migration

If you have an existing hazo_files table, run the V3 migration to add the content_tag column:

import { migrateToV3, HAZO_FILES_MIGRATION_V3 } from 'hazo_files';

// Using the migration helper
await migrateToV3(
  { run: (sql) => db.run(sql) },
  'sqlite'
);

// Or run statements manually
for (const stmt of HAZO_FILES_MIGRATION_V3.sqlite.alterStatements) {
  try { await db.run(stmt); } catch { /* column exists */ }
}

New tables created with HAZO_FILES_TABLE_SCHEMA already include the content_tag column.

File Reference Tracking

Track which entities (form fields, chat messages, etc.) reference each file. Multiple entities can reference the same file, enabling shared files without duplication.

Adding and Removing References

import { TrackedFileManager } from 'hazo_files';

// Upload a file with an initial reference
const result = await trackedManager.uploadFileWithRef(buffer, '/docs/report.pdf', {
  scope_id: 'workspace-123',
  uploaded_by: 'user-456',
  ref: {
    entity_type: 'form_field',
    entity_id: 'field-789',
    created_by: 'user-456',
  },
});
// result.data.file_id, result.data.ref_id

// Add another reference to the same file
await trackedManager.addRef(fileId, {
  entity_type: 'chat_message',
  entity_id: 'msg-abc',
});

// Remove a specific reference
const { remaining_refs } = await trackedManager.removeRef(fileId, refId);

// Get file with status info
const fileStatus = await trackedManager.getFileById(fileId);
// { record, refs: FileRef[], is_orphaned: boolean }

Orphan Detection and Cleanup

// Find files with zero references
const orphans = await trackedManager.findOrphanedFiles({
  olderThanMs: 7 * 24 * 60 * 60 * 1000, // 7 days old
  scope_id: 'workspace-123',
});

// Clean up orphaned files (delete physical files + DB records)
const { cleaned, errors } = await trackedManager.cleanupOrphanedFiles({
  olderThanMs: 30 * 24 * 60 * 60 * 1000,
  softDeleteOnly: false, // true to only mark as soft_deleted
});

// Soft-delete a specific file
await trackedManager.softDeleteFile(fileId);

// Verify physical file existence
const exists = await trackedManager.verifyFileExistence(fileId);

Database Migration (Existing Databases)

If you have an existing hazo_files table, run the V2 migration to add reference tracking columns:

import { migrateToV2, backfillV2Defaults, HAZO_FILES_MIGRATION_V2 } from 'hazo_files';

// Using the migration helper
await migrateToV2(
  { run: (sql) => db.exec(sql) }, // SQLite
  'sqlite'
);
await backfillV2Defaults({ run: (sql) => db.exec(sql) }, 'sqlite');

// Or run statements manually
for (const stmt of HAZO_FILES_MIGRATION_V2.sqlite.alterStatements) {
  try { await db.run(stmt); } catch { /* column exists */ }
}
for (const idx of HAZO_FILES_MIGRATION_V2.sqlite.indexes) {
  await db.run(idx);
}

New tables created with HAZO_FILES_TABLE_SCHEMA already include V2 columns. For V3 content tagging migration, see Content Tagging above.

Reference Tracking Types

import type {
  FileRef,           // Individual reference from entity to file
  FileMetadataRecordV2,  // Extended record with refs, status, scope
  FileWithStatus,    // Rich view: record + parsed refs + is_orphaned
  FileStatus,        // 'active' | 'orphaned' | 'soft_deleted' | 'missing'
  AddRefOptions,     // Options for adding a reference
  RemoveRefsCriteria, // Criteria for bulk ref removal
} from 'hazo_files';

File Change Detection

Detect file content changes using fast xxHash hashing.

import { TrackedFileManager, computeFileHash, hasFileContentChanged } from 'hazo_files';

// TrackedFileManager automatically tracks file hashes on upload
const result = await trackedManager.uploadFile(buffer, '/docs/report.pdf', {
  skipHash: false, // Hash is computed by default
  awaitRecording: true, // Wait for DB record before returning
});

// Check if a file has changed since it was tracked
const hasChanged = await trackedManager.hasFileChanged('/docs/report.pdf');
if (hasChanged) {
  console.log('File has been modified since last upload');
}

// Get stored hash and size
const hash = await trackedManager.getStoredHash('/docs/report.pdf');
const size = await trackedManager.getStoredSize('/docs/report.pdf');

// Use hash utilities directly
const fileHash = await computeFileHash(buffer);
const changed = await hasFileContentChanged(oldHash, newBuffer);

Server Entry Point

For server-side applications, use the /server entry point which includes a factory function:

import { createHazoFilesServer } from 'hazo_files/server';

const hazoFiles = await createHazoFilesServer({
  crudService: fileCrud,
  namingCrudService: namingCrud,
  config: {
    provider: 'local',
    local: { basePath: './storage' },
  },
  enableTracking: true,
  llmFactory: (provider) => createLLM({ provider }),
  // Optional: enable automatic content tagging for all uploads
  defaultContentTagConfig: {
    content_tag_set_by_llm: true,
    content_tag_prompt_area: 'classification',
    content_tag_prompt_key: 'classify_document',
    content_tag_prompt_return_fieldname: 'document_type',
  },
});

// Access all services
const { fileManager, metadataService, namingService, extractionService, uploadExtractService } = hazoFiles;

API Reference

FileManager

Main service class providing unified file operations.

Methods

initialize(config?: HazoFilesConfig): Promise<void> - Initialize the file manager
createDirectory(path: string): Promise<OperationResult<FolderItem>> - Create directory
removeDirectory(path: string, recursive?: boolean): Promise<OperationResult> - Remove directory
uploadFile(source, remotePath, options?): Promise<OperationResult<FileItem>> - Upload file
downloadFile(remotePath, localPath?, options?): Promise<OperationResult<Buffer | string>> - Download file
moveItem(sourcePath, destinationPath, options?): Promise<OperationResult<FileSystemItem>> - Move file/folder
deleteFile(path: string): Promise<OperationResult> - Delete file
renameFile(path, newName, options?): Promise<OperationResult<FileItem>> - Rename file
renameFolder(path, newName, options?): Promise<OperationResult<FolderItem>> - Rename folder
listDirectory(path, options?): Promise<OperationResult<FileSystemItem[]>> - List directory contents
getItem(path: string): Promise<OperationResult<FileSystemItem>> - Get file/folder info
exists(path: string): Promise<boolean> - Check if file/folder exists
getFolderTree(path?, depth?): Promise<OperationResult<TreeNode[]>> - Get folder tree
writeFile(path, content, options?): Promise<OperationResult<FileItem>> - Write text file
readFile(path: string): Promise<OperationResult<string>> - Read text file
copyFile(sourcePath, destinationPath, options?): Promise<OperationResult<FileItem>> - Copy file
ensureDirectory(path: string): Promise<OperationResult<FolderItem>> - Ensure directory exists

Types

type StorageProvider = 'local' | 'google_drive';

interface FileItem {
  id: string;
  name: string;
  path: string;
  size: number;
  mimeType: string;
  createdAt: Date;
  modifiedAt: Date;
  isDirectory: false;
  parentId?: string;
  metadata?: Record<string, unknown>;
}

interface FolderItem {
  id: string;
  name: string;
  path: string;
  createdAt: Date;
  modifiedAt: Date;
  isDirectory: true;
  parentId?: string;
  children?: (FileItem | FolderItem)[];
  metadata?: Record<string, unknown>;
}

interface OperationResult<T = void> {
  success: boolean;
  data?: T;
  error?: string;
}

interface UploadOptions {
  overwrite?: boolean;
  onProgress?: (progress: number, bytesTransferred: number, totalBytes: number) => void;
  metadata?: Record<string, unknown>;
}

See src/types/index.ts for complete type definitions.

Error Handling

hazo_files provides comprehensive error types:

import {
  FileNotFoundError,
  DirectoryNotFoundError,
  FileExistsError,
  DirectoryExistsError,
  DirectoryNotEmptyError,
  PermissionDeniedError,
  InvalidPathError,
  FileTooLargeError,
  InvalidExtensionError,
  AuthenticationError,
  ConfigurationError,
  OperationError
} from 'hazo_files';

// Use in try-catch
try {
  await fileManager.uploadFile(buffer, '/files/test.exe');
} catch (error) {
  if (error instanceof InvalidExtensionError) {
    console.error('File type not allowed');
  } else if (error instanceof FileTooLargeError) {
    console.error('File is too large');
  }
}

Extending with Custom Storage Providers

See docs/ADDING_MODULES.md for a complete guide on creating custom storage modules.

Quick example:

import { BaseStorageModule } from 'hazo_files';
import type { StorageProvider, OperationResult, FileItem } from 'hazo_files';

class S3StorageModule extends BaseStorageModule {
  readonly provider: StorageProvider = 's3' as StorageProvider;

  async initialize(config: HazoFilesConfig): Promise<void> {
    await super.initialize(config);
    // Initialize S3 client
  }

  async uploadFile(source, remotePath, options?): Promise<OperationResult<FileItem>> {
    // Implement S3 upload
  }

  // Implement other required methods...
}

// Register the module
import { registerModule } from 'hazo_files';
registerModule('s3', () => new S3StorageModule());

Testing

The package includes a test application in test-app/ demonstrating:

Next.js 14+ integration
API routes for file operations
FileBrowser UI component usage
Local storage and Google Drive switching
OAuth flow implementation

To run the test app:

cd test-app
npm install
npm run dev

Visit http://localhost:3000

Browser Compatibility

The UI components require:

Modern browsers with ES2020+ support
React 18+
CSS Grid and Flexbox support

Server-side code requires Node.js 16+

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Commit your changes with clear messages
Add tests for new functionality
Submit a pull request

Support

GitHub Issues: https://github.com/pub12/hazo_files/issues
Documentation: https://github.com/pub12/hazo_files

Roadmap

Amazon S3 storage module
Dropbox storage module
OneDrive storage module
WebDAV support
Advanced search and filtering
Batch operations
File versioning
Sharing and permissions
Real-time file sync
Thumbnail generation

Credits

Created by Pubs Abayasiri

Built with:

TypeScript
React
Google APIs (googleapis)
xxhash-wasm for fast file hashing
@dnd-kit for drag-and-drop
tsup for building

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

hazo_files

Features

Installation

Tailwind CSS v4 Setup (Required for UI Components)

Quick Start

Basic Usage (Server-side)

Using Configuration File

React UI Component

Advanced Usage

Google Drive Integration

1. Set up Google Cloud Console

2. Configure Environment Variables

3. Configure hazo_files

4. Implement OAuth Flow

Next.js API Route Example

File Upload API Route

Progress Tracking

File Operations

Working with Text Files

Folder Tree

Configuration

Configuration File (hazo_files_config.ini)

Environment Variables

Configuration via Code

UI Components

FileBrowser Component

Drag-and-Drop File Moving

Individual Components

FileInfoPanel Component

Hooks

Naming Rule Configurator

Naming Convention Management Components

Naming Rules API

Available System Variables

Parsing Pattern Strings

Extraction Data Management

Data Structure

Using with FileMetadataService

Using Utility Functions Directly

Merge Strategies

Migration from Old Format

Naming Convention Management

NamingConventionService

NamingConventionManager UI Component

Upload with LLM Extraction

UploadExtractService

LLMExtractionService Standalone

Content Tagging

Configuration

Automatic Tagging at Upload

Manual Tagging

V3 Database Migration

File Reference Tracking

Adding and Removing References

Orphan Detection and Cleanup

Database Migration (Existing Databases)

Reference Tracking Types

File Change Detection

Server Entry Point

API Reference

FileManager

Methods

Types

Error Handling

Extending with Custom Storage Providers

Testing

Browser Compatibility

License

Contributing

Support

Roadmap

Credits

Configuration File (`hazo_files_config.ini`)