job-scout
v1.6.6
Published
TypeScript-native job-scout library
Maintainers
Readme
job-scout
TypeScript-first job scraping library built on Crawlee for aggregating jobs from multiple job boards with a single request model and a unified runtime configuration.
Supported providers: indeed, glassdoor, techinasia, kalibrr, glints, jobstreet, bayt, dealls, karir, lokerid, remoteok, jobicy, himalayas, remotive, nodesk, workingnomads. Experimental support for linkedin, zipRecruiter, google, naukri, and bdjobs.
Install
npm install job-scout # or pnpm|yarn|bun add job-scoutRequirements:
- Node.js
>=20 - Camoufox binaries installed if you use browser-backed providers such as experimental
linkedinorglints, if you want JobStreet browser-auth fallback support, or if you want Glassdoor browser-based location resolution:npx camoufox-js fetch - Chromium installed if you want the runtime fallback when Camoufox cannot launch:
npx playwright install chromium - A runtime capable of launching Camoufox or Playwright Chromium for live LinkedIn, Glints, and optional JobStreet browser-auth fallback. Some locked-down sandboxes cannot launch either browser, even with
--no-sandbox.
Quick Start
import { createClient, searchLocations } from 'job-scout/server'
const [yogyakarta] = searchLocations('Yogyakarta', {
countryIsoCode: 'ID',
types: ['City'],
limit: 1,
})
if (!yogyakarta) {
throw new Error('Yogyakarta not found in location database')
}
const client = createClient({
runtime: { requestTimeoutMs: 20_000 },
logging: { level: 'warn' },
})
const run = await client.scout({
providers: ['indeed'],
query: 'software engineer',
locationCode: yogyakarta.code,
pagination: { limitPerProvider: 20 },
filters: { postedWithinHours: 72 },
})
const jobs = await run.collect()
const dataset = await run.dataset()
console.log(dataset.id)
console.log(jobs.length)
console.log(jobs[0])Crawlee Run API
client.scout() is the primary Crawlee-backed entrypoint. It starts a run and returns a handle that can:
collect()normalized jobs from the run datasetevents()stream live job and lifecycle eventsdataset()expose the underlying dataset identitystats()report emitted, deduped, completed, and failed counts
import { createClient, searchLocations } from 'job-scout/server'
const [unitedStates] = searchLocations('United States', {
types: ['Country'],
limit: 1,
})
const client = createClient()
const run = await client.scout({
providers: ['indeed'],
query: 'backend engineer',
locationCode: unitedStates!.code,
})
for await (const event of run.events()) {
if (event.type === 'job') {
console.log(event.job.title)
}
}API Surface
import {
EXPERIMENTAL_JOB_PROVIDERS,
STABLE_JOB_PROVIDERS,
Provider,
} from 'job-scout'
import {
createClient,
getLocation,
getManyLocations,
searchLocations,
} from 'job-scout/server'Browser-safe top-level exports:
Provider(canonical scraper provider constants)allJobProviders(runtime list of all public provider IDs)STABLE_JOB_PROVIDERS,EXPERIMENTAL_JOB_PROVIDERS,ALL_JOB_PROVIDERS- browser-safe value helpers, constants, error classes, and types
Server-only exports:
createClient(config?)searchLocations(query, options?)getLocation({ code })getManyLocations(request)
Configured client methods:
client.searchLocations(query, options?)client.getLocation({ code })client.getManyLocations(request)client.getBrowserAuthStatus(request)client.bootstrapBrowserAuth(request)client.scout(request)client.streamScout(request, options?)
Location Lookup
locationCode should come from the packaged SQLite location database. Resolve it with searchLocations() before calling the job APIs. The library ships the database as a read-only server-side asset; callers do not need to manage a separate database file.
import {
getLocation,
getManyLocations,
searchLocations,
} from 'job-scout/server'
const matches = searchLocations('South Jakarta', {
countryIsoCode: 'ID',
types: ['City'],
limit: 5,
})
const selected = matches[0]
if (!selected) {
throw new Error('Location not found')
}
const location = getLocation({ code: selected.code })
const siblings = getManyLocations({
parentCode: selected.parentCode,
limit: 10,
})
console.log(selected.code)
console.log(location)
console.log(siblings)type LocationSearchOptions = {
countryIsoCode?: string // Narrow matches to a specific country ISO code.
parentCode?: string | null // Scope search under one parent location.
types?: LocationType[] // Restrict results to location kinds such as 'City' or 'Country'.
limit?: number // Cap the number of returned matches.
}
type GetManyLocationsRequest =
| {
codes: string[] // Resolve many exact location codes in one call.
countryIsoCode?: string
types?: LocationType[]
limit?: number
}
| {
parentCode?: string | null // Use null to browse top-level records.
countryIsoCode?: string
types?: LocationType[]
limit?: number
}
type GetLocationRequest = {
code: string // Resolve one exact location code.
}
type LocationRecord = {
code: string // Canonical ID used by job search requests.
name: string // Base location name.
type: LocationType // Exported union of supported location kinds.
parentCode: string | null // Parent location when the record is nested.
countryIsoCode: string | null // Canonical ISO country code.
display: string // Human-readable display label.
hasChildren: boolean // Whether getManyLocations({ parentCode: code }) can drill into nested results.
}Country lookups normalize sovereign-like top-level records as Country results. This includes dataset-backed records such as Taiwan and Hong Kong, plus generated top-level country roots for datasets that only contain subregions, such as China.
Client API
Use createClient() when you want to reuse the same config across many requests.
import { createClient } from 'job-scout/server'
const client = createClient({
runtime: { requestTimeoutMs: 20_000 },
logging: { level: 'warn' },
})
const [austin] = client.searchLocations('Austin', {
countryIsoCode: 'US',
types: ['City'],
limit: 1,
})
if (!austin) {
throw new Error('Austin not found in location database')
}
const run = await client.scout({
providers: ['indeed'],
query: 'backend engineer',
locationCode: austin.code,
})
const jobs = await run.collect()Use the standalone location functions when you do not need shared config. Use createClient() for any operation that depends on runtime config, including client.scout(request), client.streamScout(request, options?), client.getBrowserAuthStatus(request), and client.bootstrapBrowserAuth(request).
Custom Logging
Job Scout uses a small logger interface instead of requiring a logging framework. By default, library logs go to console, and logging.level controls which messages are emitted.
import type { Logger } from 'job-scout'
import { createClient } from 'job-scout/server'
const logger: Logger = {
error(message, ...args) {
console.error('[job-scout:error]', message, ...args)
},
warn(message, ...args) {
console.warn('[job-scout:warn]', message, ...args)
},
info(message, ...args) {
console.info('[job-scout:info]', message, ...args)
},
debug(message, ...args) {
console.debug('[job-scout:debug]', message, ...args)
},
}
const client = createClient({
logging: {
level: 'info',
logger,
},
})Injected loggers receive formatted messages with component names such as JobScout:engine and JobScout:provider:linkedin, so you can route or reformat library logs without adopting a specific logging package.
Request Model
Use the request shape below as the main reference for client.scout() and client.streamScout().
type JobSearchRequest = {
providers: JobProvider[] // Required, non-empty. Stable and experimental provider IDs are exported unions.
query?: string // Search keywords used by most providers.
locationCode?: string // Canonical location code from searchLocations().
pagination?: {
limitPerProvider?: number // Max jobs fetched from each provider. Default: 15.
offset?: number // Provider-specific offset where supported. Default: 0.
}
filters?: {
distanceMiles?: number // Default: 50.
remote?: boolean // Default: false.
easyApply?: boolean
employmentType?: EmploymentType
postedWithinHours?: number
}
linkedin?: {
companyIds?: number[] // Limit LinkedIn results to specific company IDs.
}
}Provider Rules
type StableJobProvider =
| 'indeed'
| 'techinasia'
| 'kalibrr'
| 'glints'
| 'jobstreet'
| 'bayt'
| 'dealls'
| 'karir'
| 'lokerid'
| 'remoteok'
| 'jobicy'
| 'himalayas'
| 'remotive'
| 'nodesk'
| 'workingnomads'
type ExperimentalJobProvider =
| 'linkedin'
| 'zipRecruiter'
| 'glassdoor'
| 'google'
| 'naukri'
| 'bdjobs'Runtime constants are exported too, so you can avoid hardcoded provider strings:
import {
EXPERIMENTAL_JOB_PROVIDERS,
STABLE_JOB_PROVIDERS,
Provider,
} from 'job-scout'
const stableOnly = [...STABLE_JOB_PROVIDERS]
const experimentalOnly = [...EXPERIMENTAL_JOB_PROVIDERS]
console.log(Provider.ZIP_RECRUITER) // "zip_recruiter" (internal scraper ID)- Experimental providers still require explicit config opt-in before use.
locationCodeis also used for provider market selection and unsupported regions are skipped during request compilation.- Region-locked providers currently include
dealls(Indonesia),glints(Indonesia, Singapore, Vietnam, Malaysia, Taiwan, Philippines, China, and Hong Kong),jobstreet(Malaysia, Singapore, Philippines, and Indonesia),kalibrr(Indonesia and Philippines),lokerid(Indonesia),naukri(India), andbdjobs(Bangladesh). jobstreetandkalibrrrequirelocationCodeso the provider can select a supported country domain/market.- Glints falls back to the selected country's default all-locations search when a market-specific location label cannot be resolved cleanly.
- JobStreet uses GraphQL search/detail with browser fallback when GraphQL auth/session/contract failures block search. It still requires
locationCodefor market selection, appliesdistanceMilesas unsupported, and verifieseasyApplyfrom the detail response. - Kalibrr uses public HTTP JSON endpoints, applies explicit country filters for Indonesia/Philippines, and maps
remoteplus supported employment-type filters natively while verifyingpostedWithinHoursandeasyApplyclient-side. - Lokerid uses same-origin Remix data endpoints on
www.loker.id, supports keyword search in the first release, and ignores unsupported shared filters instead of mapping them natively. - Jobicy uses Jobicy's public JSON remote-jobs feed, always normalizes results as remote roles, maps
queryto upstreamtag, appliespostedWithinHoursclient-side, and only sends broadgeofilters whenlocationCodecan be reduced cleanly to a country or coarse region. - Himalayas uses Himalayas' public JSON jobs API, always normalizes results as remote roles, maps
query,locationCodecountry, and supported employment types upstream, includes worldwide-friendly jobs alongside country-matched jobs, and appliespostedWithinHoursclient-side. - Remotive uses Remotive's public JSON remote-jobs API, always normalizes results as remote roles, maps
queryupstream tosearch, applieslocationCode,employmentType, andpostedWithinHoursclient-side, and keeps worldwide or region-compatible restrictions when country filtering is requested. - Nodesk uses Nodesk's public Algolia-backed remote job index with detail-page enrichment, maps
queryupstream, applies best-effort internal remote-region mapping fromlocationCode, keeps worldwide or compatible region buckets when country filtering is requested, and falls back to the no-JS HTML list when Algolia is unavailable.
Filter Constraints
Some providers reject incompatible filter combinations. The library enforces those combinations in TypeScript and at runtime.
- Indeed supports only one filter group at a time:
postedWithinHours, oreasyApply, oremploymentType/remote - LinkedIn cannot combine
postedWithinHourswitheasyApply
Example of a valid request for JobStreet:
const request = {
providers: ['jobstreet'],
query: 'software engineer',
locationCode: 'MY-14-KUL',
} satisfies JobSearchRequestConfiguration Model
type JobScoutConfig = {
enrichment?: 'normal' | 'high' | 'veryHigh' // Enrichment effort level for all providers. Default: 'normal'. Higher levels can trigger extra shared/company-website enrichment steps. `veryHigh` can also fall back to authenticated LinkedIn company profile scraping when LinkedIn is explicitly enabled and a company LinkedIn URL is known but the company website is still missing.
runtime?: {
requestTimeoutMs?: number // Default: 20_000 ms.
providerFailureMode?: 'throw' | 'swallow' // Default: 'throw'. Controls whether per-provider scraper failures make `run.collect()` throw or return only successful-provider results.
storage?:
| boolean // `true` => persistent storage in the default `.job-scout-storage` directory. `false` => ephemeral.
| string // Persistent storage rooted at this directory.
| {
mode?: 'ephemeral' | 'persistent' // Default: 'ephemeral'.
directory?: string // Optional persistent storage directory. Defaults to `.job-scout-storage`.
}
proxy?: {
urls?: string[] // Rotating proxy list used by Crawlee proxy configuration. Default: []. Example: ['http://user:pass@proxyserver:port'].
}
browser?: {
userAgent?: string // Shared user agent override for HTTP and browser-backed crawls.
headless?: boolean // Browser mode for Playwright-backed scraping. Default: true.
}
browserAuth?: {
profiles?: Record<
string,
| {
provider: 'jobstreet'
market: 'MY' | 'SG' | 'PH' | 'ID'
}
| {
provider: 'linkedin'
}
>
jobstreet?: Partial<Record<'MY' | 'SG' | 'PH' | 'ID', string>> // Shorthand provider mapping.
linkedin?: string // Shorthand provider mapping.
providerProfiles?: {
jobstreet?: Partial<Record<'MY' | 'SG' | 'PH' | 'ID', string>>
linkedin?: string
}
// Default: no profiles and no provider mappings (browser auth disabled).
}
jobstreetSession?: Partial<
Record<
'MY' | 'SG' | 'PH' | 'ID',
{
cookies:
| string
| Array<{
name: string
value: string
}>
solId?: string
sessionId?: string
visitorId?: string
userQueryId?: string
providerContext?: string
include?: string[]
queryHints?: string[]
relatedSearchesCount?: number
}
>
>
// Optional programmatic JobStreet GraphQL session bundles. Default: disabled.
concurrency?:
| number // Shorthand: applies the same global limit to both HTTP and browser work.
| {
providers?: number // Max providers to run at once. Default: all requested providers in parallel.
http?:
| number // Global cap for HTTP/Crawlee request work. Default: 24.
| {
global?: number // Global cap for HTTP/Crawlee request work.
perProvider?: Partial<
Record<JobProvider, number> // Provider-specific HTTP request caps.
>
}
browser?:
| number // Global cap for Playwright/browser tasks across scraping and browser-assisted enrichment. Default: 2.
| {
global?: number // Global cap for Playwright/browser tasks.
perProvider?: Partial<
Record<JobProvider, number> // Provider-specific browser task caps.
>
}
}
retry?:
| false // Disable list/detail retries.
| number // Shorthand: applies the same retry budget to list and detail pages.
| {
list?: number // Shorthand for listPages.
detail?: number // Shorthand for detailPages.
backoff?: {
baseMs?: number // Shorthand for baseDelayMs.
maxMs?: number // Shorthand for maxDelayMs.
}
listPages?: number // Default: 2.
detailPages?: number // Default: 1.
baseDelayMs?: number // Default: 250.
maxDelayMs?: number // Default: 3000.
}
sessions?: {
enabled?: boolean // Default: true.
persistCookies?: boolean // Default: true.
maxPoolSize?: number // Default: 50.
maxUsageCount?: number // Default: 25.
maxAgeSecs?: number // Default: 1800.
}
advanced?: {
maxRequestsPerMinute?: number // Unlimited unless set.
}
}
experimental?:
{
sites?: ExperimentalJobProvider[] // Preferred shorthand list of enabled experimental providers.
experimentalSites?: Partial<Record<ExperimentalJobProvider, boolean>> // Missing keys default to false.
}
output?: {
descriptionFormat?: 'markdown' | 'html' | 'plain' // Default: 'markdown'. `plain` preserves readable paragraphs and list bullets.
annualizeSalary?: boolean // Default: false.
salaryFallback?: 'usOnly' // Default: 'usOnly'.
}
logging?:
| 'error' | 'warn' | 'info' | 'debug' // Shorthand log level.
| {
level?: 'error' | 'warn' | 'info' | 'debug' // Default: 'error'.
logger?: Logger // Optional sink for Job Scout log messages. Defaults to console.
}
}Defaults:
runtime.storage = trueis shorthand for persistent storage in the default.job-scout-storagedirectory.runtime.storage = '.job-scout-storage'is shorthand for persistent storage in that directory.runtime.storage.modedefaults toephemeral, so library calls do not persist Crawlee storage unless you opt in.runtime.browser.headlessdefaults totrue.runtime.providerFailureModedefaults tothrow.runtime.browserAuthis opt-in and disabled by default.runtime.jobstreetSessionis opt-in and disabled by default.runtime.sessions.enableddefaults totrue.runtime.requestTimeoutMsdefaults to20_000.runtime.retry = falsedisables list/detail retries.runtime.retry = 2applies the same retry budget to list and detail pages.runtime.concurrency = 1is shorthand for setting both HTTP and browser global concurrency to1.runtime.concurrency.providersdefaults to all requested providers running in parallel.runtime.concurrency.http = 24is shorthand for setting the HTTP global concurrency cap to24.runtime.concurrency.browser = 2is shorthand for setting the browser global concurrency cap to2.runtime.concurrency.http.globaldefaults to24.runtime.concurrency.browser.globaldefaults to2.- Base per-provider concurrency defaults are
5for browser-backed providers (linkedin,google,glints,jobstreet) and24for non-browser providers. JobStreet keeps the browser-backed default because it can still fall back to the browser scraper when GraphQL is unavailable. runtime.concurrency.http.perProvider[provider]overrides the HTTP limit for that provider.runtime.concurrency.browser.perProvider[provider]overrides the browser limit for that provider.- Per-scope concurrency precedence is: scoped provider override, then scraper-specific override (if a scraper defines one), then the runtime base default above.
experimental: { sites: ['linkedin', 'google'] }is the shorthand for enabling experimental providers.logging: 'info'is shorthand for{ logging: { level: 'info' } }.
Example enabling an experimental provider:
const config = {
experimental: {
experimentalSites: {
linkedin: true,
google: true,
},
},
} satisfies JobScoutConfigBrowser Auth
LinkedIn is experimental, browser-auth-only, and can reuse manually bootstrapped browser logins. JobStreet can either use a programmatic GraphQL session bundle or reuse a manually bootstrapped browser login. Glassdoor uses normal HTTP scraping and may use the browser only to resolve locations. JobStreet auth is market-scoped for MY, SG, PH, and ID; LinkedIn auth is provider-scoped.
Constraints:
- Browser auth requires
runtime.storage.mode = 'persistent'. - Browser auth supports zero proxies or one fixed proxy URL only. Rotating proxy pools are rejected when auth is configured.
client.getBrowserAuthStatus()live-validates the saved profile by opening the auth provider in a browser context seeded from the stored state.client.bootstrapBrowserAuth()launches a headed browser and expects you to complete the LinkedIn or SEEK/JobStreet sign-in flow yourself, unlessskipIfReady: trueis set and the saved auth state already validates successfully.- LinkedIn browser auth is fail-fast. If the saved login is missing or invalid, the provider raises an error instead of silently falling back to anonymous mode.
- Set
runtime.providerFailureMode = 'swallow'if you wantrun.collect()to return only successful-provider results when a provider such as LinkedIn fails. - JobStreet prefers
runtime.jobstreetSessionfor GraphQL search. If GraphQL search is blocked by auth/session requirements or request/contract failures, the provider falls back to the browser scraper. A JobStreet browser profile is still optional and is only used when you want to seed that browser session with a saved login.
Example programmatic JobStreet session bundle:
const config = {
runtime: {
jobstreetSession: {
ID: {
cookies: [
{ name: 'sol_id', value: 'visitor-id' },
{ name: 'JobseekerSessionId', value: 'session-id' },
{ name: 'JobseekerVisitorId', value: 'session-id' },
],
solId: 'visitor-id',
sessionId: 'session-id',
visitorId: 'visitor-id',
},
},
},
} satisfies JobScoutConfigExample:
import { createClient } from 'job-scout/server'
const config = {
runtime: {
storage: {
mode: 'persistent',
directory: '.job-scout-storage',
},
browserAuth: {
profiles: {
'jobstreet-my-main': {
provider: 'jobstreet',
market: 'MY',
},
},
providerProfiles: {
jobstreet: {
MY: 'jobstreet-my-main',
},
},
},
},
} satisfies JobScoutConfig
const client = createClient(config)
const authStatus = await client.getBrowserAuthStatus({
provider: 'jobstreet',
market: 'MY',
})
await client.bootstrapBrowserAuth({
provider: 'jobstreet',
market: 'MY',
skipIfReady: true,
})
const run = await client.scout({
providers: ['jobstreet'],
query: 'software engineer',
locationCode: 'MY-14-KUL',
})
const jobs = await run.collect()You can also call the same flow through a configured client:
import { createClient } from 'job-scout/server'
const client = createClient(config)
await client.bootstrapBrowserAuth({
provider: 'jobstreet',
market: 'MY',
skipIfReady: true,
})client.getBrowserAuthStatus() returns a result shaped like:
type BrowserAuthStatusResult = {
provider: 'jobstreet' | 'linkedin'
profile: string
market?: JobStreetAuthMarket
status: 'ready' | 'missing' | 'needsBootstrap'
exists: boolean
usable: boolean
storageStatePath: string | null
checkedAt: Date
reason?: 'missing' | 'invalidated' | 'mismatch' | 'unauthenticated'
}client.bootstrapBrowserAuth() returns:
type BrowserAuthBootstrapResult = {
provider: 'jobstreet' | 'linkedin'
profile: string
market?: JobStreetAuthMarket
storageStatePath: string
authenticatedAt: Date
reusedExisting: boolean
}LinkedIn is experimental and browser-auth-only. By default, when providers: ['linkedin'] is requested without experimental: { sites: ['linkedin'] } (or experimental.experimentalSites.linkedin = true) and a configured authenticated LinkedIn profile, run.collect() throws a browser-auth-required provider error. The same LinkedIn opt-in also enables the veryHigh company-profile enrichment fallback. Set runtime.providerFailureMode = 'swallow' to keep the run result and return [] for the failed LinkedIn portion instead.
The same LinkedIn browser auth profile is also reused by shared enrichment at enrichment: 'veryHigh' for non-LinkedIn jobs when:
company.linkedInUrlis already knowncompany.websiteUrlis still missing
That fallback visits the LinkedIn company /about/ page, fills only missing company fields, and if it discovers a website URL the existing company website enrichment can continue in the same run.
runtime.browserAuth.jobstreet maps the JobStreet market used at scrape time to the named profile that should seed the browser session. runtime.browserAuth.linkedin maps LinkedIn scraping to a named LinkedIn profile. The older runtime.browserAuth.providerProfiles.* form is also accepted. client.bootstrapBrowserAuth() can also target a specific profile directly with profile: 'jobstreet-my-main' or profile: 'linkedin-main'.
Result Model
client.scout() returns a ScoutRun. Call collect() when you want the materialized Job[]. In the default runtime.providerFailureMode = 'throw', collect() throws on the first failed provider. In 'swallow', it returns jobs from successful providers and omits failed ones.
const client = createClient(config)
const run = await client.scout(request)
const jobs = await run.collect()
const stats = await run.stats()run.stats() returns aggregate counters plus per-provider summaries. In runtime.providerFailureMode = 'swallow', use providerSummaries to distinguish failed providers from successful providers that returned zero jobs.
type ScoutRunStats = {
status: 'running' | 'completed' | 'failed'
emittedTotal: number
skippedByDedupeTotal: number
providerCount: number
completedProviders: number
failedProviders: number
providerSummaries: Array<{
provider: JobProvider
status: 'pending' | 'running' | 'succeeded' | 'failed'
emitted: number
skippedByDedupe: number
errorMessage?: string
}>
}Collected jobs are unified Job[] records from providers without an extra domain remapping step.
type Job = {
id?: string | null
provider?: Provider | null
title: string
jobUrl: string
jobUrlDirect?: string | null
location?: Location | null
description?: string | null
jobType?: JobType | null
company: {
name?: string | null
providerUrl?: string | null
websiteUrl?: string | null
phones?: string[] | null
linkedInUrl?: string | null
employeeLinkedInUrls?: string[] | null // Public employee/recruiter LinkedIn profile URLs found on the company website.
socialUrls?: string[] | null
careersUrl?: string | null
industry?: string | null
addresses?: string | null
numEmployees?:
| '1-10'
| '11-50'
| '51-200'
| '201-500'
| '501-1000'
| '1001-5000'
| '5001-10000'
| '10000+'
| null
revenue?: string | null
foundedYear?: number | null
hqCountryIsoCode?: string | null
description?: string | null
logo?: string | null
rating?: number | null
reviewsCount?: number | null
}
salary?: {
interval?: 'yearly' | 'monthly' | 'weekly' | 'daily' | 'hourly' | null
minAmount?: number | null
maxAmount?: number | null
currency?: string | null
} | null
postedAt?: Date | null
expiresAt?: Date | null
recruiters?: Array<{
name?: string
email?: string | null
}>
additionalEmails?: string[] | null // Emails found in the posting after recruiter-like addresses are split out.
potentialRecruiterEmails?: string[] | null // Heuristically recruiter-like emails (for example hr/careers aliases).
workMode?: 'remote' | 'hybrid' | 'onSite' | null
badges?: string[] // Normalized listing badges/labels such as premium employer or boosted.
tags?: string[] // Additional non-badge provider labels such as keywords, categories, or source-specific descriptors.
level?: string | null
field?: string | null
skills?: string[] | null
benefits?: string[] | null
experienceRange?: '0-3' | '3-5' | '5-8' | '8-12' | '12+' | null
vacancyCount?: number | null
}
type Location = {
country: string | null
city: string | null
state: string | null
displayLocation(): string
}Streaming API
Use the streaming APIs when you want to process results incrementally instead of waiting for a full batch.
import { createClient, searchLocations } from 'job-scout/server'
const [unitedStates] = searchLocations('United States', {
types: ['Country'],
limit: 1,
})
if (!unitedStates) {
throw new Error('United States not found in location database')
}
const client = createClient()
for await (const event of client.streamScout(
{
providers: ['indeed'],
query: 'backend engineer',
locationCode: unitedStates.code,
pagination: { limitPerProvider: 10 },
},
{
includeLifecycleEvents: true,
dedupeStrategy: 'crossProvider',
},
)) {
if (event.type === 'job') {
console.log(event.provider, event.providerIndex, event.job.title)
continue
}
console.log(event.type)
}type JobStreamOptions = {
failFast?: boolean // Stop the stream after the first provider failure. Default: false.
includeLifecycleEvents?: boolean // Emit providerStart, providerDone, providerError, and complete. Default: false.
dedupeStrategy?: 'batchCompatible' | 'crossProvider' // Default: 'crossProvider'.
}
type JobStreamEvent =
| {
type: 'job'
provider: JobProvider
providerIndex: number
globalIndex: number
job: Job
}
| {
type: 'providerStart'
provider: JobProvider
}
| {
type: 'providerError'
provider: JobProvider
error: unknown
}
| {
type: 'providerDone'
provider: JobProvider
emitted: number
skippedByDedupe: number
}
| {
type: 'complete'
emittedTotal: number
skippedByDedupeTotal: number
providerCount: number
}Streaming order is completion-order within a provider. Batch APIs still collect and normalize the full result set before returning.
Examples
Repository examples:
examples/locations.tsexamples/scout-jobs.tsexamples/stream-jobs.tsexamples/scout-jobs-with-playwright.ts
Those scripts import from ../src/server and are intended for local repository usage. Contributor workflow, tests, and release steps are documented in CONTRIBUTING.md.
