aathena

v0.3.4

Published

a month ago

Type-safe AWS Athena client for TypeScript

0High
0Medium
0Low

bug3

athena aws-athena aws typescript type-safe codegen sql

aathena

Type-safe AWS Athena client for TypeScript.

Scaffold a project straight from your AWS account with npx aathena init
1:1 type mapping from AWS Glue to TypeScript, including native Parquet/ORC arrays, maps, and structs - no CAST() or JSON.parse
Run queries concurrently with parallel(), with automatic detection of injected-projection partitions (including those hidden behind Presto/Trino views)

Built on @aws-sdk/client-athena and sql-render. See the examples/ directory for working projects.

Quick Start

1. Install

npm install aathena

2. Initialize

npx aathena init

init walks you through a complete setup: picks an AWS region, picks a Glue database, lets you multi-select tables, probes each for partition requirements (following Presto/Trino views to their underlying tables), auto-runs generate, and writes a runnable src/main.ts that imports and invokes every scaffolded query. See CLI Reference for the full sequence and flags.

3. Run it

npx tsx src/main.ts

If any scaffolded query needs partition values, main.ts passes REPLACE_ME placeholders with a note at the top of the file. Replace them with real values, then run.

Every row is typed 1:1 against your Glue schema. Scalars land as native TypeScript (Date for timestamps, bigint, number, string), and Parquet/ORC arrays, maps, and structs parse back recursively - even nested. Hover any field in your editor to see the exact column type, with no CAST() in SQL and no manual JSON.parse:

// Glue: event_id bigint, created_at timestamp, tags array<varchar>, metadata map<string,int>, address struct<city:string>, items array<struct<qty:int>>

const result = await getEvents(athena, { rowLimit: 33 });
const row = result.rows[0];

row.eventId;         // bigint
row.createdAt;       // Date
row.tags;            // string[]
row.metadata;        // Record<string, number>
row.address.city;    // string - direct struct field access
row.items[0].qty;    // number - nested array of struct

Write Queries

Your scaffolded queries live under tables/{database}/{table}/{query-name}.sql. Open the file init (or add) created and shape it to your needs. Placeholders use {{name}} syntax. Types are inferred from SQL context:

WHERE status = '{{status}}'    -- quoted       -> string
LIMIT {{rowLimit}}             -- LIMIT/OFFSET -> positiveInt
WHERE price >= {{minPrice}}    -- comparison   -> number

For stricter validation, annotate with -- @param:

-- @param status enum('active','pending','done')
-- @param rowLimit positiveInt
-- @param startDate isoDate
SELECT *
FROM events
WHERE status = '{{status}}'
  AND created_at >= '{{startDate}}'
LIMIT {{rowLimit}}

Generates:

interface DefaultParams {
  status: 'active' | 'pending' | 'done';
  rowLimit: number;               // validated > 0
  startDate: string;              // validated YYYY-MM-DD
}

See @param types in Reference for the full annotation list.

Run Queries

Every generated query accepts a client and its typed parameters:

import { createClient } from 'aathena';
import { eventsDefault } from './generated';

const athena = createClient();
const result = await eventsDefault(athena, { status: 'active', rowLimit: 99 });

createClient() reads aathena.config.json automatically; pass an explicit config to override (createClient({ region: 'us-east-1', database: 'analytics' })) - useful in tests or when the project root isn't on disk.

client.query() retries StartQueryExecution with exponential backoff + full jitter when Athena responds with TooManyRequestsException / CONCURRENT_QUERY_LIMIT_EXCEEDED. Up to 6 attempts. Applies to every call, generated or inline, including tasks dispatched by parallel().

Debug: export rendered SQL

Pass { exportTo: <path> } as the optional third argument to dump the rendered SQL (with parameter values substituted) to disk. The query still executes; missing parent directories are created, and the file is overwritten on each call. Forwarded directly to sql-render.

await eventsDefault(
  athena,
  { status: 'active', rowLimit: 99 },
  { exportTo: './debug/eventsDefault.sql' },
);

`parallel()`

parallel() runs multiple queries concurrently with a bounded cap that respects Athena's per-account active-DML quota. Tasks are passed as thunks (() => query(...)) so the helper can gate when each query actually starts.

import { createClient, parallel } from 'aathena';
import { getUsers, getOrders } from './generated';

const athena = createClient();

const [users, orders] = await parallel(
  [
    () => getUsers(athena, { rowLimit: 99 }),
    () => getOrders(athena, { from: '2022-02-02' }),
  ],
  { concurrency: 'auto', client: athena },
);

Concurrency resolution

concurrency accepts a number or 'auto'. With 'auto', the cap is resolved in this order:

AathenaConfig.maxConcurrency if set
Live AWS Service Quotas lookup (L-D405C694 DML, L-FCDFE414 DDL)
Region-aware conservative fallback (50% of AWS-documented default, clamped to [5, 25])

The live lookup uses @aws-sdk/client-service-quotas, an optional dependency loaded via dynamic import. It requires the servicequotas:GetServiceQuota IAM permission; without it the lookup fails silently and the fallback is used.

Options

| Option | Default | Description | |---|---|---| | concurrency | 5 | number or 'auto' | | client | - | Required when concurrency: 'auto' and maxConcurrency is unset | | kind | 'dml' | 'dml' or 'ddl', selects which quota to probe | | reserveHeadroom | 1 | Subtracted from the resolved quota | | mode | 'all' | 'all' rejects on first failure; 'allSettled' returns per-task settlements |

Error handling

import {
  AathenaError,
  QueryTimeoutError,
  QueryFailedError,
  QueryCancelledError,
  ColumnParseError,
} from 'aathena';

try {
  const result = await eventsDefault(athena, { status: 'active', rowLimit: 99 });
} catch (err) {
  if (err instanceof QueryTimeoutError) {
    console.log(`Timed out after ${err.timeoutMs}ms: ${err.queryExecutionId}`);
  } else if (err instanceof QueryFailedError) {
    console.log(`Athena error: ${err.athenaErrorMessage}`);
  } else if (err instanceof AathenaError) {
    // QueryCancelledError, ColumnParseError, or anything else aathena threw
    console.log(`aathena error (${err.name}): ${err.message}`);
  } else {
    throw err;
  }
}

All four specific classes extend AathenaError, so catching the base is enough when you don't need to discriminate.

Advanced

Partition projection and views

Athena tables with partition projection type=injected require a static WHERE predicate on the partition column. Without it, queries fail at runtime with CONSTRAINT_VIOLATION.

init and add inspect each selected table's Glue Parameters for projection settings and scaffold the SQL with:

-- @param <col> string annotations
WHERE <col> = '{{<col>}}' predicates
REPLACE_ME placeholder values in src/main.ts plus a note at the top

When the target is a Presto/Trino view (detected via TableType === 'VIRTUAL_VIEW' or the presto_view / trino_view parameter flag), aathena decodes the view's ViewOriginalText, extracts FROM / JOIN table references, and probes those underlying tables recursively. Max traversal depth is 3 levels with a visited set to defuse cycles. If a view's references cannot be parsed or the probe hits the limit, a note is written into the scaffolded SQL so you know to add the predicates by hand.

Example output for a view over a partitioned table:

-- Starter aathena query. Edit or delete as needed.
-- See README for placeholder and parameter syntax.
-- View 'analytics.events' traced to: analytics.events_raw
-- @param tenant_id string
-- @param rowLimit positiveInt

SELECT *
FROM events
WHERE tenant_id = '{{tenant_id}}'
LIMIT {{rowLimit}}

Cross-database queries

By default every query runs against config.database. When a SQL file lives in tables/{other-db}/... (different from config.database), generate automatically emits an explicit per-query binding so it routes to that database at runtime:

// generated/queries/sales/events/default.ts (config.database is 'marketing')
export const default_ = createQuery<Events, DefaultParams>(
  'tables/sales/events/default.sql',
  schemaDef,
  { database: 'sales' },
);

You don't write this yourself - add <db>.<table> and generate handle it. For ad-hoc inline queries use client.query(sql, { database: 'sales' }).

Query statistics

Every QueryResult includes a statistics block from Athena (cost tracking, queueing visibility, cache hits). Pass { includeRuntimeStats: true } for input/output row counts via an extra GetQueryRuntimeStatistics API call.

const result = await athena.query<Row>(sql, { includeRuntimeStats: true });
result.statistics.totalExecutionTimeInMillis;   // 8128
result.statistics.resultReused;                 // true
result.statistics.runtime?.outputRows;          // 99

| Field | Notes | |---|---| | engineExecutionTimeInMillis | Engine execution time | | totalExecutionTimeInMillis | Wall time Athena took | | queryQueueTimeInMillis | Time spent waiting in the queue | | queryPlanningTimeInMillis | Planning + partition retrieval | | servicePreProcessingTimeInMillis | Preprocessing before engine | | serviceProcessingTimeInMillis | Result publication | | dataScannedInBytes | Drives query cost | | dpuCount? | Capacity-reservation workgroups only | | resultReused? | True if served from result cache | | runtime?.inputRows / inputBytes | Opt-in via includeRuntimeStats | | runtime?.outputRows / outputBytes | Opt-in via includeRuntimeStats |

CLI Reference

| Command | Purpose | |---|---| | aathena init | Interactive project scaffold. Fills config from AWS, picks tables, probes partitions, runs generate, writes src/main.ts. | | aathena add <table> | Scaffold a new query under tables/{database}/{table}/<name>.sql. Accepts db.table for cross-database tables and prompts to resolve mismatches. | | aathena generate | Re-run codegen (fetch Glue schemas, produce typed query functions). Runs automatically after init and add unless --no-generate. | | aathena help | Show all flags. |

`aathena init`

Runs interactively by default, then:

Resolves the AWS region from --region, the AWS_REGION / AWS_DEFAULT_REGION env vars, or a select prompt (common commercial regions plus a Custom fallback for typing one in); lists Glue databases and Athena workgroups; inherits the workgroup's default output location when available
Writes aathena.config.json and adds node_modules/ to .gitignore (the generated/ directory is committed by default; delete it or add it to .gitignore yourself if you prefer to regenerate on every build)
Lets you multi-select which tables to scaffold starter SQL for
Probes each selected table (and any Presto/Trino view it points at) for injected-projection partitions that require WHERE predicates
Writes tables/{db}/{table}/default.sql with LIMIT {{rowLimit}}, plus the right -- @param / WHERE lines
Runs generate to produce typed query functions under generated/
Writes a runnable src/main.ts that imports and calls every scaffolded query (single call for 1 table, parallel() demo for 2+)

Flags:

--region, --database, --workgroup, --output-location, --tables a,b,c - non-interactive overrides
--tables-dir <path> - override the SQL scaffold root (default tables); persisted to config.tablesDir
--out-dir <path> - override the codegen output directory (default generated); persisted to config.outDir
--example-path <path> - override the runnable starter file (default src/main.ts); the from '../generated' import is rewritten to point at --out-dir from the chosen location
--no-sample - skip SQL scaffolding
--no-generate - skip the auto-generate step
--no-example - skip writing the example file
--force - overwrite aathena.config.json and regenerate the example file to reflect the current selection. SQL files are always preserved because you may have edited them.

`aathena add`

npx aathena add events                  # tables/{config.db}/events/default.sql
npx aathena add sales.events            # cross-db; interactive prompt resolves mismatch
npx aathena add events --from-schema    # embed Glue column list as a comment block
npx aathena add events --name daily     # scaffold daily.sql instead of default.sql

add always probes partitions (even without --from-schema) and auto-runs generate unless --no-generate.

Flags:

--name <query-name> - query filename (default: default)
--from-schema - include Glue column list as a comment block
--force - overwrite an existing SQL file
--no-generate - skip the auto-generate step

`aathena generate`

Re-runs codegen: fetches Glue schemas for every SQL file under tables/ in parallel and produces generated/types/{database}/{table}.ts, generated/queries/{database}/{table}/{query}.ts, and a barrel generated/index.ts. Run it after editing SQL files or when upstream schemas change. Surfaces a one-line info when a query directory's database differs from config.database and a per-query binding is emitted.

Reference

Configuration

aathena.config.json lives at your project root. init writes it for you; edit as needed afterwards. Both the CLI and runtime walk up the directory tree to find it.

{
  "region": "eu-west-1",
  "database": "sampledb",
  "workgroup": "primary",
  "outputLocation": "s3://my-bucket/athena-results/",
  "tablesDir": "./tables",
  "outDir": "./generated",
  "codegen": {
    "indent": 2
  },
  "query": {
    "timeout": 300000,
    "pollingInterval": 500,
    "maxPollingInterval": 5000
  }
}

| Field | Default | Description | |---|---|---| | region | AWS default | AWS region | | database | required | Primary Athena database (used when a query's directory doesn't specify otherwise) | | workgroup | - | Athena workgroup | | outputLocation | - | S3 path for query results (optional if workgroup has a default) | | tablesDir | ./tables | SQL files directory | | outDir | ./generated | Codegen output directory | | codegen.indent | 2 | Number of spaces used to indent generated TypeScript files (matches Prettier / Google / Airbnb / Angular / NestJS defaults). Clamped to [1, 8] | | query.timeout | 300000 | Query timeout in ms (5 min) | | query.pollingInterval | 500 | Initial poll interval in ms | | query.maxPollingInterval | 5000 | Max poll interval in ms | | maxConcurrency | - | Override for parallel({ concurrency: 'auto' }) when Service Quotas is unreachable |

Type mapping

Glue column types map 1:1 to TypeScript:

| Athena | TypeScript | |---|---| | varchar, string, char | string | | integer, int, smallint, tinyint | number | | bigint | bigint | | double, float, real | number | | decimal | string (preserves precision) | | boolean | boolean | | date | string (YYYY-MM-DD) | | timestamp | Date | | json | unknown | | binary, varbinary | string (Base64 encoded) | | array<T> | T[] | | map<K, V> | Record<K, V> | | struct<a:T, b:U> | { a: T; b: U } |

Regular columns are nullable (T | null); partition keys are always non-null. See Quick Start step 3 for a typed-access demo that covers scalars and nested complex types.

`@param` types

| Annotation | TypeScript | Validation | |---|---|---| | string | string | SQL injection check | | number | number | Finite number | | boolean | boolean | true / false | | positiveInt | number | Integer > 0 | | isoDate | string | YYYY-MM-DD | | isoTimestamp | string | ISO 8601 | | identifier | string | SQL identifier | | uuid | string | RFC 4122 UUID | | s3Path | string | s3://bucket/path | | enum('a','b','c') | 'a' \| 'b' \| 'c' | Whitelist |

Generated export names

The barrel at generated/index.ts re-exports every query under a JS-safe identifier:

latest.sql -> latest
default.sql (JS reserved word) -> aliased as <table>Default, e.g. eventsDefault
Two queries with the same filename across tables -> aliased as <table>{Query} on both sides

So a scaffolded tables/sampledb/events/default.sql shows up as:

import { eventsDefault } from './generated';
const result = await eventsDefault(athena, { rowLimit: 33 });

Directory structure

project/
├── aathena.config.json        # project root marker, written by 'init'
├── tables/                    # SQL files you edit
│   └── sampledb/              # database
│       └── events/            # table
│           ├── default.sql
│           └── daily.sql
├── generated/                 # codegen output, committed by default
│   ├── index.ts               # barrel re-exporting every query under a JS-safe name
│   ├── types/                 # one file per table, mirroring the Glue schema
│   │   └── sampledb/
│   │       └── events.ts
│   └── queries/               # one file per SQL, binding it to its types
│       └── sampledb/
│           └── events/
│               ├── default.ts
│               └── daily.ts
└── src/
    └── main.ts                # runnable example, written by 'init'

Nested grouping under a table (e.g. events/cart/add.sql) works too - codegen walks the whole tree under tables/.

Requirements

Node.js >= 20
AWS credentials resolvable via the default provider chain (env, shared config, SSO, IAM role, etc.)
IAM permissions:
- athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults (runtime)
- athena:ListWorkGroups, athena:GetWorkGroup (init)
- glue:GetDatabases, glue:GetTables, glue:GetTable (init, add, generate)
- servicequotas:GetServiceQuota (optional, enables parallel({ concurrency: 'auto' }))
- Your query's S3 read + output-bucket write permissions

sql-render - type-safe SQL templating with injection protection (used internally)

For LLMs

Two machine-readable summaries are maintained for AI consumption, following the llms.txt convention:

llms.txt - curated API reference and usage guide
llms-full.txt - full packed source, auto-generated on each push

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

aathena

Quick Start

1. Install

2. Initialize

3. Run it

Write Queries

Run Queries

Debug: export rendered SQL

parallel()

Concurrency resolution

Options

Error handling

Advanced

Partition projection and views

Cross-database queries

Query statistics

CLI Reference

aathena init

aathena add

aathena generate

Reference

Configuration

Type mapping

@param types

Generated export names

Directory structure

Requirements

Related

For LLMs

License

`parallel()`

`aathena init`

`aathena add`

`aathena generate`

`@param` types