npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@brochington/arrowstore

v1.0.0

Published

ArrowStore is a high-performance data store implementation using Apache Arrow Tables with vectorized operations for improved performance and memory efficiency. It provides a comprehensive API for data manipulation, filtering, sorting, and aggregation oper

Downloads

117

Readme

ArrowStore

ArrowStore is a high-performance data store implementation using Apache Arrow Tables with vectorized operations for improved performance and memory efficiency. It provides a comprehensive API for data manipulation, filtering, sorting, and aggregation operations with lazy evaluation.

Core Features

  • Lazy Evaluation: Operations are queued and only executed when data is actually needed
  • Vectorized Operations: Optimized for performance using Apache Arrow's columnar memory format
  • Memory Efficiency: Batch processing and smart memory management for large datasets
  • Comprehensive Query API: Rich set of operations for filtering, transforming, and analyzing data
  • SQL-like Capabilities: Support for SQL-like filtering and queries

Installation

npm install @brochington/arrowstore

Basic Usage

import { ArrowStore, Aggregations } from 'arrow-store';
import { tableFromArrays } from 'apache-arrow';

// Create Arrow table from data
const data = {
  id: [1, 2, 3, 4, 5],
  name: ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve'],
  age: [25, 30, 35, 40, 45],
  department: ['Engineering', 'Product', 'Engineering', 'HR', 'Product']
};

const table = tableFromArrays(data);

// Create ArrowStore instance
const store = new ArrowStore(table);

// Chain operations (these are lazily evaluated)
const result = await store
  .filter([
    { field: 'age', filter: { op: 'gte', value: 30 } }
  ])
  .sort([{ field: 'name', direction: 'asc' }])
  .getAll();

console.log(result);

Constructor

new ArrowStore<T>(table, schema?, options?)

Creates a new ArrowStore instance.

Parameters:

  • table: Arrow Table - The table containing the data
  • schema?: TableSchema - Optional schema definition
  • options?: ArrowStoreOptions - Optional configuration options

Type Parameters:

  • T: Record<string, any> - Type of the row objects

Example:

const store = new ArrowStore(table);

Core Methods

Data Retrieval

getAll(): Promise<T[]>

Returns all data from the store.

Returns: Promise resolving to an array of row objects

Example:

const allData = await store.getAll();

count(): Promise<number>

Counts the number of rows in the store.

Returns: Promise resolving to the number of rows

Example:

const rowCount = await store.count();

getSource(): Table

Gets the underlying Apache Arrow Table.

Returns: The Apache Arrow Table

Example:

const arrowTable = store.getSource();

Filtering

filter<R extends T = T>(filters: FilterCondition<T>[]): ArrowStore<R>

Filters the data based on the provided filter conditions.

Parameters:

  • filters: Array of filter conditions to apply

Returns: A new ArrowStore instance with filtered data

Example:

const filteredStore = store.filter([
  { field: 'age', filter: { op: 'gte', value: 30 } },
  { 
    OR: [
      { field: 'department', filter: { op: 'eq', value: 'Engineering' } },
      { field: 'department', filter: { op: 'eq', value: 'Product' } }
    ]
  }
]);

filterSql<R extends T = T>(sqlFilter: string): ArrowStore<R>

Filters data using a SQL-like WHERE clause.

Parameters:

  • sqlFilter: SQL-like WHERE clause string (without the "WHERE" keyword)

Returns: A new ArrowStore instance with filtered data

Example:

const filteredStore = store.filterSql(
  "age >= 30 AND department IN ('Engineering', 'Product')"
);

filterEquals<R extends T = T>(simpleFilters: Partial<T>): ArrowStore<R>

Filters data with simple field-value equality pairs.

Parameters:

  • simpleFilters: Object where keys are field names and values are what to match

Returns: A new ArrowStore instance with filtered data

Example:

const filteredStore = store.filterEquals({
  department: 'Engineering',
  active: true
});

Transformation

select<K extends keyof T>(fields: K[]): ArrowStore<Pick<T, K>>

Selects specific fields/columns.

Parameters:

  • fields: Array of field names to select

Returns: A new ArrowStore instance with only the selected fields

Example:

const nameAndAgeStore = store.select(['name', 'age']);

map<R extends Record<string, any>>(mapFn: (item: T) => R, resultSchema?: TableSchema): ArrowStore<R>

Maps each row to a new object structure.

Parameters:

  • mapFn: Function to transform each row
  • resultSchema?: Optional schema for the transformed data

Returns: A new ArrowStore instance with mapped data

Example:

const mappedStore = store.map(person => ({
  fullName: `${person.firstName} ${person.lastName}`,
  birthYear: new Date().getFullYear() - person.age
}));

Sorting and Pagination

sort(options: SortOptions[]): ArrowStore<T>

Sorts the data based on one or more fields.

Parameters:

  • options: Array of sort configurations with field and direction

Returns: A new ArrowStore instance with sorted data

Example:

const sortedStore = store.sort([
  { field: 'age', direction: 'desc' },
  { field: 'name', direction: 'asc' }
]);

paginate(page: number, pageSize: number): ArrowStore<T>

Paginates data with the specified page and page size.

Parameters:

  • page: Page number (1-based)
  • pageSize: Number of items per page

Returns: A new ArrowStore instance with paginated data

Example:

const pageTwo = store.paginate(2, 10); // Second page with 10 items per page

slice(start: number, end: number): ArrowStore<T>

Returns a slice of the data from start to end.

Parameters:

  • start: Start index (inclusive)
  • end: End index (exclusive)

Returns: A new ArrowStore instance with the sliced data

Example:

const slicedStore = store.slice(10, 20); // Items 10-19

Aggregation

groupBy<K extends keyof T, R extends Record<string, any>>(field: K, aggregations: Record<string, (values: any[]) => any>): ArrowStore<R>

Groups data by a field and computes aggregations.

Parameters:

  • field: Field to group by
  • aggregations: Object mapping output field names to aggregation functions

Returns: A new ArrowStore instance with grouped and aggregated data

Example:

const departmentStats = store.groupBy('department', {
  count: Aggregations.count(),
  avgAge: Aggregations.avg('age'),
  totalSalary: Aggregations.sum('salary')
});

// Result structure example:
// [
//   { department: 'Engineering', count: 2, avgAge: 30, totalSalary: 200000 },
//   { department: 'Product', count: 2, avgAge: 37.5, totalSalary: 220000 },
//   { department: 'HR', count: 1, avgAge: 40, totalSalary: 90000 }
// ]

Reduction Operations

reduce<R>(reducer: (accumulator: R, current: T, index: number) => R, initialValue: R): Promise<R>

Reduces the data to a single value.

Parameters:

  • reducer: Function to apply to each row with an accumulator
  • initialValue: Initial value for the accumulator

Returns: Promise resolving to the accumulated result

Example:

const totalAge = await store.reduce(
  (sum, person, index) => sum + person.age, 
  0
);

fold<R>(folder: (accumulator: R, current: T, index: number) => R, initialValue: R): Promise<R>

Alias for reduce.

toSet<K>(keyFn?: (item: T) => K): Promise<Set<K | T>>

Converts the data to a Set with optional key extraction.

Parameters:

  • keyFn?: Optional function to extract a key from each row

Returns: Promise resolving to a Set of unique values

Example:

// Get unique departments
const departments = await store.toSet(person => person.department);

toMap<K, V = T>(keyFn: (item: T) => K, valueFn?: (item: T) => V): Promise<Map<K, V>>

Converts the data to a Map with keys and values extracted from rows.

Parameters:

  • keyFn: Function to extract a key from each row
  • valueFn?: Optional function to transform each row into a value

Returns: Promise resolving to a Map

Example:

// Create a map of id -> name
const idToName = await store.toMap(
  person => person.id,
  person => person.name
);

Memory Management

flush(): Promise<ArrowStore<T>>

Executes all pending operations and returns a new store.

Returns: Promise resolving to a new ArrowStore with computed results

Example:

const computedStore = await store.flush();

estimateMemoryUsage(): Promise<MemoryUsageInfo>

Estimates memory usage of the store.

Returns: Promise resolving to detailed memory usage information

Example:

const memoryInfo = await store.estimateMemoryUsage();
console.log(`Total estimated memory: ${memoryInfo.humanReadable.total}`);

Column Operations

sumColumn(columnName: string): Promise<number>

Computes the sum of values in a numeric column.

Parameters:

  • columnName: Name of the column to sum

Returns: Promise resolving to the sum

Example:

const totalSalary = await store.sumColumn('salary');

averageColumn(columnName: string): Promise<number | null>

Computes the average of values in a numeric column.

Parameters:

  • columnName: Name of the column to average

Returns: Promise resolving to the average or null if no data

Example:

const avgAge = await store.averageColumn('age');

minMaxColumn(columnName: string): Promise<{ min: any; max: any }>

Finds the minimum and maximum values in a column.

Parameters:

  • columnName: Name of the column to analyze

Returns: Promise resolving to an object with min and max values

Example:

const { min, max } = await store.minMaxColumn('age');
console.log(`Age range: ${min} - ${max}`);

countDistinct(columnName: string): Promise<number>

Counts distinct values in a column.

Parameters:

  • columnName: Name of the column to analyze

Returns: Promise resolving to the count of distinct values

Example:

const departmentCount = await store.countDistinct('department');

Advanced Usage Examples

Chaining Operations

const results = await store
  .filter([{ field: 'active', filter: { op: 'eq', value: true } }])
  .select(['id', 'name', 'department', 'salary'])
  .sort([{ field: 'salary', direction: 'desc' }])
  .slice(0, 10)
  .getAll();

Complex Filtering

import { and, or, not, field, gt, eq, inArray } from 'arrow-store';

const filtered = store.filter([
  and([
    field('age', gt(30)),
    or([
      field('department', eq('Engineering')),
      field('department', eq('Product'))
    ]),
    not(field('isContractor', eq(true)))
  ])
]);

Using SQL-like Filters

const filtered = store.filterSql(
  "age > 30 AND department IN ('Engineering', 'Product') AND NOT isContractor = true"
);

Grouping and Aggregation

const stats = await store
  .groupBy('department', {
    count: Aggregations.count(),
    avgAge: Aggregations.avg('age'),
    minSalary: Aggregations.min('salary'),
    maxSalary: Aggregations.max('salary')
  })
  .sort([{ field: 'count', direction: 'desc' }])
  .getAll();

Processing Large Datasets Efficiently

// For very large datasets, use batch processing patterns
const store = new ArrowStore(largeTable);

// Use aggregations rather than loading all data
const summary = {
  total: await store.count(),
  averageAge: await store.averageColumn('age'),
  departmentCounts: await store
    .groupBy('department', { count: Aggregations.count() })
    .getAll()
};

// Only retrieve necessary data
const topEmployees = await store
  .sort([{ field: 'performance', direction: 'desc' }])
  .slice(0, 100)  // Only get top 100
  .select(['id', 'name', 'performance'])  // Only select needed fields
  .getAll();

Type Definitions

For complete type definitions, refer to the source code or TypeScript declaration files.