@xpert-ai/plugin-markitdown
v0.0.5
Published
`@xpert-ai/plugin-markitdown` bootstraps Microsoft's [MarkItDown](https://github.com/microsoft/markitdown) inside the agent sandbox and teaches the agent to convert files and URLs to Markdown through `sandbox_shell`.
Readme
Xpert Plugin: MarkItDown Middleware
@xpert-ai/plugin-markitdown bootstraps Microsoft's MarkItDown inside the agent sandbox and teaches the agent to convert files and URLs to Markdown through sandbox_shell.
What This Middleware Does
- Installs
markitdownvia pip inside the sandbox on demand. - Writes the embedded
SKILL.mdasset into the sandbox for agent self-guidance. - Injects a short MarkItDown-specific
<skill>prompt into model calls when sandbox support is available. - Re-checks bootstrap state before real
markitdownshell commands and refreshes the sandbox when version, extras, or skill assets drift. - Warns when
MarkItDownSkillis configured without sandbox support or withoutSandboxShellon the same agent.
Quick Start
Register the plugin:
PLUGINS=@xpert-ai/plugin-markitdownEnable the sandbox feature for the target team or agent.
Add
SandboxShellto the same agent.Add a middleware entry with strategy
MarkItDownSkill.Optionally configure bootstrap behavior:
{ "type": "MarkItDownSkill", "options": { "version": "latest", "extras": "all", "skillsDir": "/workspace/.xpert/skills/markitdown", "pipIndexUrl": "https://pypi.tuna.tsinghua.edu.cn/simple", "pipExtraIndexUrl": "https://mirrors.aliyun.com/pypi/simple" } }
Configuration
| Field | Type | Description | Default |
| ----- | ---- | ----------- | ------- |
| version | string | Version of markitdown to install via pip in the sandbox. | "latest" |
| extras | string | Pip extras to install, for example all, pdf, docx, pptx, xlsx, xls, outlook, az-doc-intel, audio-transcription, or youtube-transcription. | "all" |
| skillsDir | string | Path inside the sandbox where SKILL.md is written. | "/workspace/.xpert/skills/markitdown" |
| pipIndexUrl | string | Custom pip index URL for downloading packages (e.g., "https://pypi.tuna.tsinghua.edu.cn/simple"). | undefined (use pip default) |
| pipExtraIndexUrl | string | Additional pip index URL as fallback. | undefined |
Runtime Behavior
- Bootstrap state is tracked in
/workspace/.xpert/.markitdown-bootstrap.json. - The middleware checks both the bootstrap stamp and the real sandbox state:
markitdownmust still exist onPATH- the current
skillsDirmust still contain the skill assets - the configured
versionandextrasmust still match the recorded stamp
- If
markitdownis already available onPATHbut no bootstrap stamp exists yet, the middleware rewrites the managed assets and stamp without forcing a reinstall. - If the tool is already present but skill assets are missing, the middleware rewrites the assets without reinstalling the package.
- If
versionorextraschange, the middleware reruns pip install and refreshes the stamp. - Only actual
markitdowninvocations are intercepted insandbox_shell; plain text mentions such asecho markitdownare ignored.
OCR, Plugins, and Azure
- This middleware no longer documents
ocras an official MarkItDown extra. - In practice, OCR is usually supplied by an installed third-party plugin and enabled with
--use-plugins. - Azure Document Intelligence is supported through
-d -ewhen the sandbox install includesaz-doc-intelorall.
Examples:
markitdown report.pdf -o report.md
cat page.html | markitdown -x html
markitdown --list-plugins
markitdown --use-plugins scanned.pdf -o scanned.md
markitdown -d -e "https://your-resource.cognitiveservices.azure.com/" form.pdf -o form.mdSandbox Assets
During bootstrap, the plugin writes these files into the sandbox:
SKILL.md
The injected prompt tells the agent to read that file before first use instead of embedding all CLI details directly in the system prompt.
Validation Rules
The plugin contributes warnings when:
MarkItDownSkillis used while sandbox support is disabledMarkItDownSkillis used withoutSandboxShellon the same agent
Development and Validation
Run these commands from the repository root:
env NX_DAEMON=false pnpm -C xpertai exec nx build @xpert-ai/plugin-markitdown
env NX_DAEMON=false pnpm -C xpertai exec nx test @xpert-ai/plugin-markitdown --runInBand
pnpm -C plugin-dev-harness build
node plugin-dev-harness/dist/index.js --workspace ./xpertai --plugin ./middlewares/markitdownThe build output is written to xpertai/middlewares/markitdown/dist.
