@kemeny/interview-game

v0.2.2

Published

a month ago

Client and CLI for the Kemeny Studio evaluation sandbox.

Downloads

159

0High
0Medium
0Low

johnny_kemeny

jcoruiz

agent evaluation hiring

Kemeny Interview Game

A 2D puzzle sandbox. Each turn your agent receives a grid and decides one action. Build an agent that explores the world and figures out how to advance through the levels.

The game runs on a remote server. Your agent is a small JavaScript file you write locally and run via the CLI; it talks to the server one turn at a time.

Setup

1. Install the package

npm install @kemeny/interview-game

2. Configure your API key

You should have received an API key from us. Set it in your environment:

export KEMENY_API_KEY=your_key_here

(If you'd rather pass it inline, prefix the command: KEMENY_API_KEY=... npx kemeny-run-agent ....)

3. Create your agent

// my-agent.js
function chooseAction(observation) {
  // Decide what to do this turn.
  // Return one of: 'up' | 'down' | 'left' | 'right' | 'reset' | null
  return 'right';
}

module.exports = { chooseAction };

4. Run your agent

npx kemeny-run-agent --agent my-agent.js --output scorecard.json

Typical output:

=== Agent runner ===
Agent: my-agent  Server: https://interview-api.kemenylabs.com  Cap: 80

▶ Level 1 (L1)
  ✓ WIN in 4 moves (320ms)

▶ Level 2 (L2)
  ✗ no progress in 9 moves (810ms)
  ...

=== Summary: 1/4 levels solved (3.2s) ===
Scorecard saved to scorecard.json

5. Submit your agent code

When you're satisfied with your agent, send the code to the evaluator:

npx kemeny-submit-agent --agent my-agent.js --notes "short note for the evaluator"

This packages your project (excluding node_modules, .git, files over 1 MB and archives) into a zip and uploads it via the same API key. The server stores the zip; it never executes your code.

Useful flags:

--root <dir> — directory to package (default: directory of --agent).
--dry-run — list the files that would be sent without uploading.
--output <path> — save the zip locally in addition to uploading.

Agent contract

chooseAction(observation) is called once per turn. It can be synchronous or return a Promise (handy if you call out to an external service like an LLM).

You return one of:

'up' | 'down' | 'left' | 'right' | 'reset' | null

'reset' restarts the current level (useful if you want to try a different approach). null skips the level (counts as a fail).

The observation looks like:

{
  rooms:            { [roomId: string]: string[][] },  // 2D grids indexed [y][x]; '.' is empty
  moveCount:        0,                                  // 0-indexed turn within the level
  availableActions: ['down','left','reset','right','up'],
  lastAction:       string | null,                      // your previous action, if any
  lastResult:       'moved' | 'no_effect' | 'reset' | null,
}

Each rooms[id] is a 2D array of strings indexed [y][x]. Cells with '.' are empty; other cells contain tokens whose meaning you discover by playing.

lastResult is the engine's feedback after each action — useful to confirm whether what you tried had any effect.

Programmatic API

If you'd rather drive the run yourself instead of using the CLI:

const { runAgent } = require('@kemeny/interview-game');

(async () => {
  const scorecard = await runAgent(
    (observation) => 'right',
    { cap: 80 } // apiKey, apiBase optional — read from env by default
  );
  console.log(scorecard);
})();

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme