@vertana/context-web

v0.2.0

Published

a month ago

Web context gathering for Vertana - fetch and extract content from linked pages

0High
0Medium
0Low

hongminhee

LLM translation context web readability

@vertana/context-web

Web context gathering for Vertana — fetch and extract content from linked pages to provide additional context for translation.

Features

The recommended way to give the translator access to web context is to expose passive sources, which the translator only invokes when it decides it actually needs them:

fetchWebPage: A passive context source that fetches a single URL and extracts the main content using Mozilla's Readability algorithm. The LLM calls it on demand with a specific URL.
searchWeb: A passive context source that performs a web search (DuckDuckGo Lite) and returns a list of results (title, URL, snippet).

A required helper is also provided for short, trusted link sets where you want links fetched up-front:

fetchLinkedPages: A required context source factory that extracts links from the source text and fetches their content before translation begins. By default it fetches up to ten links (configurable via maxLinks). This is a convenience helper; see the warning below before using it on large or untrusted documents.

Plus a low-level utility:

extractLinks: Extracts URLs from text in various formats (plain text, Markdown, HTML).

Installation

Deno

deno add jsr:@vertana/context-web

npm

npm add @vertana/context-web

pnpm

pnpm add @vertana/context-web

Usage

The recommended pattern uses passive sources, so the translator decides which URLs (if any) are worth fetching:

import { translate } from "@vertana/facade";
import { fetchWebPage, searchWeb } from "@vertana/context-web";
import { openai } from "@ai-sdk/openai";

const text = `
Check out this article: https://example.com/article
It explains the concept in detail.
`;

const result = await translate(openai("gpt-4o"), "ko", text, {
  contextSources: [
    // The translator may fetch a specific URL when it needs more context.
    fetchWebPage,
    // The translator may run a web search when it needs more context.
    searchWeb,
  ],
});

Eagerly fetching linked pages

If you have a short, trusted set of links and you want them pulled in before translation begins, fetchLinkedPages does that (up to ten links by default; raise maxLinks to widen or lower the cap):

import { translate } from "@vertana/facade";
import { fetchLinkedPages } from "@vertana/context-web";
import { openai } from "@ai-sdk/openai";

const text = "Check out https://example.com/article for details.";

const result = await translate(openai("gpt-4o"), "ko", text, {
  contextSources: [
    fetchLinkedPages({ text, mediaType: "text/plain" }),
  ],
});

Use character budgets to keep fetched reference material smaller than the source text:

fetchLinkedPages({
  text,
  mediaType: "text/plain",
  maxCharsPerPage: 2000,
  maxTotalChars: 6000,
});

For longer pages, you can summarize each fetched page with an explicit model:

const summarizerModel = openai("gpt-4o");

fetchLinkedPages({
  text,
  mediaType: "text/plain",
  summarize: { model: summarizerModel, maxChars: 800 },
});

[!WARNING] Pulling many large pages into required context can confuse the translator: when the combined reference material is much larger than the source text, and especially when it is in the target language, the model may echo a fetched page back instead of translating the actual input. For large or untrusted link sets, prefer the passive fetchWebPage source above so the translator only fetches what it actually needs.

License

MIT License

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@vertana/context-web

Features

Installation

Deno

npm

pnpm

Usage

Eagerly fetching linked pages

License