npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

aathena

v0.3.4

Published

Type-safe AWS Athena client for TypeScript

Readme

aathena

CI npm version license node

Type-safe AWS Athena client for TypeScript.

  • Scaffold a project straight from your AWS account with npx aathena init
  • 1:1 type mapping from AWS Glue to TypeScript, including native Parquet/ORC arrays, maps, and structs - no CAST() or JSON.parse
  • Run queries concurrently with parallel(), with automatic detection of injected-projection partitions (including those hidden behind Presto/Trino views)

Built on @aws-sdk/client-athena and sql-render. See the examples/ directory for working projects.

Quick Start

1. Install

npm install aathena

2. Initialize

npx aathena init

init walks you through a complete setup: picks an AWS region, picks a Glue database, lets you multi-select tables, probes each for partition requirements (following Presto/Trino views to their underlying tables), auto-runs generate, and writes a runnable src/main.ts that imports and invokes every scaffolded query. See CLI Reference for the full sequence and flags.

3. Run it

npx tsx src/main.ts

If any scaffolded query needs partition values, main.ts passes REPLACE_ME placeholders with a note at the top of the file. Replace them with real values, then run.

Every row is typed 1:1 against your Glue schema. Scalars land as native TypeScript (Date for timestamps, bigint, number, string), and Parquet/ORC arrays, maps, and structs parse back recursively - even nested. Hover any field in your editor to see the exact column type, with no CAST() in SQL and no manual JSON.parse:

// Glue: event_id bigint, created_at timestamp, tags array<varchar>, metadata map<string,int>, address struct<city:string>, items array<struct<qty:int>>

const result = await getEvents(athena, { rowLimit: 33 });
const row = result.rows[0];

row.eventId;         // bigint
row.createdAt;       // Date
row.tags;            // string[]
row.metadata;        // Record<string, number>
row.address.city;    // string - direct struct field access
row.items[0].qty;    // number - nested array of struct

Write Queries

Your scaffolded queries live under tables/{database}/{table}/{query-name}.sql. Open the file init (or add) created and shape it to your needs. Placeholders use {{name}} syntax. Types are inferred from SQL context:

WHERE status = '{{status}}'    -- quoted       -> string
LIMIT {{rowLimit}}             -- LIMIT/OFFSET -> positiveInt
WHERE price >= {{minPrice}}    -- comparison   -> number

For stricter validation, annotate with -- @param:

-- @param status enum('active','pending','done')
-- @param rowLimit positiveInt
-- @param startDate isoDate
SELECT *
FROM events
WHERE status = '{{status}}'
  AND created_at >= '{{startDate}}'
LIMIT {{rowLimit}}

Generates:

interface DefaultParams {
  status: 'active' | 'pending' | 'done';
  rowLimit: number;               // validated > 0
  startDate: string;              // validated YYYY-MM-DD
}

See @param types in Reference for the full annotation list.

Run Queries

Every generated query accepts a client and its typed parameters:

import { createClient } from 'aathena';
import { eventsDefault } from './generated';

const athena = createClient();
const result = await eventsDefault(athena, { status: 'active', rowLimit: 99 });

createClient() reads aathena.config.json automatically; pass an explicit config to override (createClient({ region: 'us-east-1', database: 'analytics' })) - useful in tests or when the project root isn't on disk.

client.query() retries StartQueryExecution with exponential backoff + full jitter when Athena responds with TooManyRequestsException / CONCURRENT_QUERY_LIMIT_EXCEEDED. Up to 6 attempts. Applies to every call, generated or inline, including tasks dispatched by parallel().

Debug: export rendered SQL

Pass { exportTo: <path> } as the optional third argument to dump the rendered SQL (with parameter values substituted) to disk. The query still executes; missing parent directories are created, and the file is overwritten on each call. Forwarded directly to sql-render.

await eventsDefault(
  athena,
  { status: 'active', rowLimit: 99 },
  { exportTo: './debug/eventsDefault.sql' },
);

parallel()

parallel() runs multiple queries concurrently with a bounded cap that respects Athena's per-account active-DML quota. Tasks are passed as thunks (() => query(...)) so the helper can gate when each query actually starts.

import { createClient, parallel } from 'aathena';
import { getUsers, getOrders } from './generated';

const athena = createClient();

const [users, orders] = await parallel(
  [
    () => getUsers(athena, { rowLimit: 99 }),
    () => getOrders(athena, { from: '2022-02-02' }),
  ],
  { concurrency: 'auto', client: athena },
);

Concurrency resolution

concurrency accepts a number or 'auto'. With 'auto', the cap is resolved in this order:

  1. AathenaConfig.maxConcurrency if set
  2. Live AWS Service Quotas lookup (L-D405C694 DML, L-FCDFE414 DDL)
  3. Region-aware conservative fallback (50% of AWS-documented default, clamped to [5, 25])

The live lookup uses @aws-sdk/client-service-quotas, an optional dependency loaded via dynamic import. It requires the servicequotas:GetServiceQuota IAM permission; without it the lookup fails silently and the fallback is used.

Options

| Option | Default | Description | |---|---|---| | concurrency | 5 | number or 'auto' | | client | - | Required when concurrency: 'auto' and maxConcurrency is unset | | kind | 'dml' | 'dml' or 'ddl', selects which quota to probe | | reserveHeadroom | 1 | Subtracted from the resolved quota | | mode | 'all' | 'all' rejects on first failure; 'allSettled' returns per-task settlements |

Error handling

import {
  AathenaError,
  QueryTimeoutError,
  QueryFailedError,
  QueryCancelledError,
  ColumnParseError,
} from 'aathena';

try {
  const result = await eventsDefault(athena, { status: 'active', rowLimit: 99 });
} catch (err) {
  if (err instanceof QueryTimeoutError) {
    console.log(`Timed out after ${err.timeoutMs}ms: ${err.queryExecutionId}`);
  } else if (err instanceof QueryFailedError) {
    console.log(`Athena error: ${err.athenaErrorMessage}`);
  } else if (err instanceof AathenaError) {
    // QueryCancelledError, ColumnParseError, or anything else aathena threw
    console.log(`aathena error (${err.name}): ${err.message}`);
  } else {
    throw err;
  }
}

All four specific classes extend AathenaError, so catching the base is enough when you don't need to discriminate.

Advanced

Partition projection and views

Athena tables with partition projection type=injected require a static WHERE predicate on the partition column. Without it, queries fail at runtime with CONSTRAINT_VIOLATION.

init and add inspect each selected table's Glue Parameters for projection settings and scaffold the SQL with:

  • -- @param <col> string annotations
  • WHERE <col> = '{{<col>}}' predicates
  • REPLACE_ME placeholder values in src/main.ts plus a note at the top

When the target is a Presto/Trino view (detected via TableType === 'VIRTUAL_VIEW' or the presto_view / trino_view parameter flag), aathena decodes the view's ViewOriginalText, extracts FROM / JOIN table references, and probes those underlying tables recursively. Max traversal depth is 3 levels with a visited set to defuse cycles. If a view's references cannot be parsed or the probe hits the limit, a note is written into the scaffolded SQL so you know to add the predicates by hand.

Example output for a view over a partitioned table:

-- Starter aathena query. Edit or delete as needed.
-- See README for placeholder and parameter syntax.
-- View 'analytics.events' traced to: analytics.events_raw
-- @param tenant_id string
-- @param rowLimit positiveInt

SELECT *
FROM events
WHERE tenant_id = '{{tenant_id}}'
LIMIT {{rowLimit}}

Cross-database queries

By default every query runs against config.database. When a SQL file lives in tables/{other-db}/... (different from config.database), generate automatically emits an explicit per-query binding so it routes to that database at runtime:

// generated/queries/sales/events/default.ts (config.database is 'marketing')
export const default_ = createQuery<Events, DefaultParams>(
  'tables/sales/events/default.sql',
  schemaDef,
  { database: 'sales' },
);

You don't write this yourself - add <db>.<table> and generate handle it. For ad-hoc inline queries use client.query(sql, { database: 'sales' }).

Query statistics

Every QueryResult includes a statistics block from Athena (cost tracking, queueing visibility, cache hits). Pass { includeRuntimeStats: true } for input/output row counts via an extra GetQueryRuntimeStatistics API call.

const result = await athena.query<Row>(sql, { includeRuntimeStats: true });
result.statistics.totalExecutionTimeInMillis;   // 8128
result.statistics.resultReused;                 // true
result.statistics.runtime?.outputRows;          // 99

| Field | Notes | |---|---| | engineExecutionTimeInMillis | Engine execution time | | totalExecutionTimeInMillis | Wall time Athena took | | queryQueueTimeInMillis | Time spent waiting in the queue | | queryPlanningTimeInMillis | Planning + partition retrieval | | servicePreProcessingTimeInMillis | Preprocessing before engine | | serviceProcessingTimeInMillis | Result publication | | dataScannedInBytes | Drives query cost | | dpuCount? | Capacity-reservation workgroups only | | resultReused? | True if served from result cache | | runtime?.inputRows / inputBytes | Opt-in via includeRuntimeStats | | runtime?.outputRows / outputBytes | Opt-in via includeRuntimeStats |

CLI Reference

| Command | Purpose | |---|---| | aathena init | Interactive project scaffold. Fills config from AWS, picks tables, probes partitions, runs generate, writes src/main.ts. | | aathena add <table> | Scaffold a new query under tables/{database}/{table}/<name>.sql. Accepts db.table for cross-database tables and prompts to resolve mismatches. | | aathena generate | Re-run codegen (fetch Glue schemas, produce typed query functions). Runs automatically after init and add unless --no-generate. | | aathena help | Show all flags. |

aathena init

Runs interactively by default, then:

  1. Resolves the AWS region from --region, the AWS_REGION / AWS_DEFAULT_REGION env vars, or a select prompt (common commercial regions plus a Custom fallback for typing one in); lists Glue databases and Athena workgroups; inherits the workgroup's default output location when available
  2. Writes aathena.config.json and adds node_modules/ to .gitignore (the generated/ directory is committed by default; delete it or add it to .gitignore yourself if you prefer to regenerate on every build)
  3. Lets you multi-select which tables to scaffold starter SQL for
  4. Probes each selected table (and any Presto/Trino view it points at) for injected-projection partitions that require WHERE predicates
  5. Writes tables/{db}/{table}/default.sql with LIMIT {{rowLimit}}, plus the right -- @param / WHERE lines
  6. Runs generate to produce typed query functions under generated/
  7. Writes a runnable src/main.ts that imports and calls every scaffolded query (single call for 1 table, parallel() demo for 2+)

Flags:

  • --region, --database, --workgroup, --output-location, --tables a,b,c - non-interactive overrides
  • --tables-dir <path> - override the SQL scaffold root (default tables); persisted to config.tablesDir
  • --out-dir <path> - override the codegen output directory (default generated); persisted to config.outDir
  • --example-path <path> - override the runnable starter file (default src/main.ts); the from '../generated' import is rewritten to point at --out-dir from the chosen location
  • --no-sample - skip SQL scaffolding
  • --no-generate - skip the auto-generate step
  • --no-example - skip writing the example file
  • --force - overwrite aathena.config.json and regenerate the example file to reflect the current selection. SQL files are always preserved because you may have edited them.

aathena add

npx aathena add events                  # tables/{config.db}/events/default.sql
npx aathena add sales.events            # cross-db; interactive prompt resolves mismatch
npx aathena add events --from-schema    # embed Glue column list as a comment block
npx aathena add events --name daily     # scaffold daily.sql instead of default.sql

add always probes partitions (even without --from-schema) and auto-runs generate unless --no-generate.

Flags:

  • --name <query-name> - query filename (default: default)
  • --from-schema - include Glue column list as a comment block
  • --force - overwrite an existing SQL file
  • --no-generate - skip the auto-generate step

aathena generate

Re-runs codegen: fetches Glue schemas for every SQL file under tables/ in parallel and produces generated/types/{database}/{table}.ts, generated/queries/{database}/{table}/{query}.ts, and a barrel generated/index.ts. Run it after editing SQL files or when upstream schemas change. Surfaces a one-line info when a query directory's database differs from config.database and a per-query binding is emitted.

Reference

Configuration

aathena.config.json lives at your project root. init writes it for you; edit as needed afterwards. Both the CLI and runtime walk up the directory tree to find it.

{
  "region": "eu-west-1",
  "database": "sampledb",
  "workgroup": "primary",
  "outputLocation": "s3://my-bucket/athena-results/",
  "tablesDir": "./tables",
  "outDir": "./generated",
  "codegen": {
    "indent": 2
  },
  "query": {
    "timeout": 300000,
    "pollingInterval": 500,
    "maxPollingInterval": 5000
  }
}

| Field | Default | Description | |---|---|---| | region | AWS default | AWS region | | database | required | Primary Athena database (used when a query's directory doesn't specify otherwise) | | workgroup | - | Athena workgroup | | outputLocation | - | S3 path for query results (optional if workgroup has a default) | | tablesDir | ./tables | SQL files directory | | outDir | ./generated | Codegen output directory | | codegen.indent | 2 | Number of spaces used to indent generated TypeScript files (matches Prettier / Google / Airbnb / Angular / NestJS defaults). Clamped to [1, 8] | | query.timeout | 300000 | Query timeout in ms (5 min) | | query.pollingInterval | 500 | Initial poll interval in ms | | query.maxPollingInterval | 5000 | Max poll interval in ms | | maxConcurrency | - | Override for parallel({ concurrency: 'auto' }) when Service Quotas is unreachable |

Type mapping

Glue column types map 1:1 to TypeScript:

| Athena | TypeScript | |---|---| | varchar, string, char | string | | integer, int, smallint, tinyint | number | | bigint | bigint | | double, float, real | number | | decimal | string (preserves precision) | | boolean | boolean | | date | string (YYYY-MM-DD) | | timestamp | Date | | json | unknown | | binary, varbinary | string (Base64 encoded) | | array<T> | T[] | | map<K, V> | Record<K, V> | | struct<a:T, b:U> | { a: T; b: U } |

Regular columns are nullable (T | null); partition keys are always non-null. See Quick Start step 3 for a typed-access demo that covers scalars and nested complex types.

@param types

| Annotation | TypeScript | Validation | |---|---|---| | string | string | SQL injection check | | number | number | Finite number | | boolean | boolean | true / false | | positiveInt | number | Integer > 0 | | isoDate | string | YYYY-MM-DD | | isoTimestamp | string | ISO 8601 | | identifier | string | SQL identifier | | uuid | string | RFC 4122 UUID | | s3Path | string | s3://bucket/path | | enum('a','b','c') | 'a' \| 'b' \| 'c' | Whitelist |

Generated export names

The barrel at generated/index.ts re-exports every query under a JS-safe identifier:

  • latest.sql -> latest
  • default.sql (JS reserved word) -> aliased as <table>Default, e.g. eventsDefault
  • Two queries with the same filename across tables -> aliased as <table>{Query} on both sides

So a scaffolded tables/sampledb/events/default.sql shows up as:

import { eventsDefault } from './generated';
const result = await eventsDefault(athena, { rowLimit: 33 });

Directory structure

project/
├── aathena.config.json        # project root marker, written by 'init'
├── tables/                    # SQL files you edit
│   └── sampledb/              # database
│       └── events/            # table
│           ├── default.sql
│           └── daily.sql
├── generated/                 # codegen output, committed by default
│   ├── index.ts               # barrel re-exporting every query under a JS-safe name
│   ├── types/                 # one file per table, mirroring the Glue schema
│   │   └── sampledb/
│   │       └── events.ts
│   └── queries/               # one file per SQL, binding it to its types
│       └── sampledb/
│           └── events/
│               ├── default.ts
│               └── daily.ts
└── src/
    └── main.ts                # runnable example, written by 'init'

Nested grouping under a table (e.g. events/cart/add.sql) works too - codegen walks the whole tree under tables/.

Requirements

  • Node.js >= 20
  • AWS credentials resolvable via the default provider chain (env, shared config, SSO, IAM role, etc.)
  • IAM permissions:
    • athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults (runtime)
    • athena:ListWorkGroups, athena:GetWorkGroup (init)
    • glue:GetDatabases, glue:GetTables, glue:GetTable (init, add, generate)
    • servicequotas:GetServiceQuota (optional, enables parallel({ concurrency: 'auto' }))
    • Your query's S3 read + output-bucket write permissions

Related

  • sql-render - type-safe SQL templating with injection protection (used internally)

For LLMs

Two machine-readable summaries are maintained for AI consumption, following the llms.txt convention:

  • llms.txt - curated API reference and usage guide
  • llms-full.txt - full packed source, auto-generated on each push

License

MIT