udf-cli
v0.3.1
Published
Convert between HTML/Markdown and UYAP UDF file format
Downloads
8,036
Maintainers
Readme
udf-cli
Convert between HTML and UYAP UDF file format. Designed for AI agents that need to read and write UDF documents programmatically.
What is UDF?
UDF is the document format used by UYAP (Turkey's National Judiciary Informatics System). A UDF file is a ZIP archive containing a single content.xml file with text content, formatting, images, and tables encoded in a custom XML schema.
Install
npm install udf-cliOr run directly:
npx udf-cli html2udf input.html output.udfCLI Usage
HTML to UDF
udf-cli html2udf <input> [output]Input accepts a file path, raw HTML string, or - for stdin. If input contains < it is treated as raw HTML. Output is optional — omit it to write to stdout.
# File to file
udf-cli html2udf input.html output.udf
# Raw HTML string to file
udf-cli html2udf '<p><strong>Merhaba</strong> dünya</p>' output.udf
# Raw HTML string to stdout
udf-cli html2udf '<p>Test</p>' > output.udf
# Stdin to file
cat input.html | udf-cli html2udf - output.udf
# Stdin to stdout (pipe-friendly)
echo '<p>Merhaba</p>' | udf-cli html2udf - > output.udfUDF to HTML
udf-cli udf2html <input> [output]Input accepts a file path or - for stdin. Output is optional — omit it to print HTML to stdout.
# File to file
udf-cli udf2html input.udf output.html
# File to stdout (agent-friendly — read UDF content directly)
udf-cli udf2html input.udf
# Stdin to stdout
cat input.udf | udf-cli udf2html -
# Full roundtrip via pipes
echo '<p>Test</p>' | udf-cli html2udf - | udf-cli udf2html -Markdown to UDF
udf-cli md2udf <input> [output]Input accepts a file path, raw Markdown string, or - for stdin.
# File to file
udf-cli md2udf input.md output.udf
# Stdin to file
cat input.md | udf-cli md2udf - output.udf
# Raw Markdown via stdin
echo '**Merhaba** dünya' | udf-cli md2udf - output.udfSupported Markdown syntax for input:
**bold**and*italic*,***bold italic***# Heading 1through###### Heading 61. Numbered listand- Bulleted list(with nesting via indentation)- Markdown tables (
| col1 | col2 |with separator row) images
UDF to Markdown
udf-cli udf2md <input> [output]Converts UDF to Markdown — ideal for AI agents that work better with Markdown than HTML. Tables become Markdown tables, bold/italic use **/* syntax, lists become 1./- items.
# Read a UDF file as Markdown (agent-friendly)
udf-cli udf2md input.udf
# Save as Markdown file
udf-cli udf2md input.udf output.mdExample output:
**T.C.**
**ANTALYA 3. ASLİYE HUKUK MAHKEMESİ**
**ESAS NO:** 2024/463
**DAVACI:** Mehmet Yılmaz
| **Sıra No** | **Açıklama** | **Toplam (TL)** |
| --- | --- | --- |
| 1 | Arsa değeri | 1.250.000,00 |
| 2 | Bina değeri | 990.000,00 |
1. Taşınmazın konumu
2. Yapının fiziksel durumu
- Dış cephe boyasında dökülme
- Çatı izolasyonunda bozulmaLibrary Usage
import { htmlToUdf, udfToHtml, udfToMarkdown, markdownToUdf } from 'udf-cli';
// HTML string → UDF buffer
const udfBuffer = await htmlToUdf('<p><strong>Mahkeme kararı</strong></p>');
fs.writeFileSync('document.udf', udfBuffer);
// UDF buffer → HTML string
const udf = fs.readFileSync('document.udf');
const html = await udfToHtml(udf);
// UDF buffer → Markdown string
const md = await udfToMarkdown(udf);
// Markdown string → UDF buffer
const udfFromMd = await markdownToUdf('**Başlık**\n\nParagraf metni');AI Authoring Guide
If you are an AI agent generating UDF documents, use HTML with inline CSS as your input format. Always use pt units. The conversions below are exhaustive — anything not listed is not supported.
System prompt snippet
Paste this into your agent's system prompt:
Generate UDF input as HTML with inline CSS. Use
ptfor all lengths.
- Inline styles:
<strong>,<em>,<u>,<span style="font-family:Arial; font-size:12pt; color:#FF0000; background-color:#FFFF00">- Paragraphs:
<p style="text-align:justify; line-height:1.5; margin-top:12pt; margin-bottom:6pt; margin-left:36pt; text-indent:24pt">- Tab stops:
<p style="tab-stops:36pt 72pt 108pt">Item<tab/>Value<tab/>Notes</p>- Page break:
<page-break/>— only use when the user explicitly asks for one. The UDF editor renders an inline "sayfa sonudur" marker between pages that readers often mistake for document content. Default to letting the page flow naturally.- Tables: standard
<table><tr><td>...</td></tr></table>; cell styles via inline CSS- Lists:
<ul>,<ol>,<li>(nesting via nested lists)- Images:
<img src="data:image/png;base64,..." width="200" height="100">Always use
pt. Do not escape<tab/>or<page-break/>. Do not use<br>to separate paragraphs — start a new<p>.
Cookbook
Coloured bold heading
<p style="text-align:center"><span style="font-family:Arial; font-size:18pt; color:#003366"><strong>BAŞLIK</strong></span></p>Justified paragraph with first-line indent
<p style="text-align:justify; text-indent:24pt; line-height:1.5">Lorem ipsum dolor sit amet ...</p>Three-column tab-stop layout (signature block)
<p style="tab-stops:200pt 400pt"><strong>Davacı</strong><tab/><strong>Davalı</strong><tab/><strong>Hâkim</strong></p>
<p style="tab-stops:200pt 400pt">Mehmet Yılmaz<tab/>Ahmet Demir<tab/>Ayşe Kaya</p>Two sections separated by a page break
<p>İlk sayfa içeriği.</p>
<page-break/>
<p>İkinci sayfa içeriği.</p>Yellow-highlighted span
<p>Bu cümlede <span style="background-color:#FFFF00">vurgulanmış kısım</span> var.</p>1.5-line-spaced paragraph with bottom margin
<p style="line-height:1.5; margin-bottom:12pt">İki yana hizalı, 1.5 satır aralıklı paragraf.</p>Bordered table with bold header row
<table>
<tr>
<td style="background-color:#EEEEEE"><strong>Sıra</strong></td>
<td style="background-color:#EEEEEE"><strong>Açıklama</strong></td>
</tr>
<tr>
<td>1</td>
<td>Birinci kalem</td>
</tr>
</table>Signature block under a table using tab stops
<table>...</table>
<p style="margin-top:24pt; tab-stops:280pt"><strong>Yargıtay</strong><tab/><strong>Mahkeme Mührü</strong></p>Unit rules
- Always use
pt. Conversion happens automatically forpx,em,rem,cm,mm,in, but agents should not rely on this — sticking toptmakes output predictable. - Bare numbers (e.g.
margin-top:12) are treated aspt.
Common mistakes
| ❌ Wrong | ✅ Right | Why |
|---|---|---|
| font-size:14px | font-size:14pt | UDF uses points; px works but agents should not guess |
| <tab/> | <tab/> | Escaped custom elements become text |
| <br><br> for paragraph break | Two separate <p> blocks | UDF is block-based; <br> is intra-paragraph soft break |
| <div> for inline text | <p> for paragraph text | <div> is a block group, <p> is a paragraph |
| Spaces in CSS like color: red | color:red | Both work but consistent compact form is recommended |
Supported HTML Elements
| HTML | UDF Mapping |
|------|-------------|
| <p> | Paragraph |
| <h1> - <h6> | Paragraph with bold, sizes 24/20/16/14/12/10 pt |
| <strong>, <b> | Bold text |
| <em>, <i> | Italic text |
| <u> | Underlined text |
| <span style="..."> | Styled text (font, size, color, background) |
| <br> | Paragraph break |
| <img src="data:..."> | Embedded image (base64 data URI) |
| <table>, <tr>, <td>, <th> | Table with borders (nested tables supported) |
| <ul>, <ol>, <li> | Bulleted / numbered lists |
| <div> | Block container |
Custom elements
These are non-standard tags handled by udf-cli:
| Element | Meaning |
|---------|---------|
| <tab/> | Inserts a tab character; aligns with the nearest tab-stops position. Self-closing. |
| <page-break/> | Inserts a page break between blocks. Self-closing. Use sparingly — the UDF editor renders a visible "sayfa sonudur" marker between pages, which readers often mistake for actual document text. Only insert when the user explicitly requests a page break. |
Supported CSS Properties
These inline styles are recognized on <p>, <span>, <td>:
font-family— mapped to UDF font familyfont-size— supportspx(converted to pt) andptfont-weight: bold— bold textfont-style: italic— italic texttext-decoration: underline— underlined textcolor— text color (hex, rgb, named colors)background-color— text/cell backgroundtext-align— left, center, right, justifyvertical-align— top, middle, bottom (table cells)line-height— paragraph line spacingtext-indent— first-line indent (positive or negative for hanging indent)tab-stops— space- or comma-separated list of tab positions (custom property)page-break-before/page-break-after(on<div>) — valuealwaysorpageinserts a page break
Writing HTML for UDF Conversion
When generating HTML that will be converted to UDF, follow these guidelines:
Text Formatting
<p>Normal text</p>
<p><strong>Bold text</strong></p>
<p><em>Italic text</em></p>
<p><strong><em>Bold and italic</em></strong></p>
<p><span style="font-family:Arial; font-size:14pt; color:#FF0000">Custom styled text</span></p>Images
Images must use base64 data URIs. Specify width and height in points:
<img src="data:image/png;base64,iVBORw0KGgo..." width="200" height="100">Tables
<table>
<tr>
<td>Cell 1</td>
<td>Cell 2</td>
</tr>
<tr>
<td colspan="2">Merged cell</td>
</tr>
</table>Nested tables are supported:
<table>
<tr>
<td>
<table>
<tr><td>Inner cell</td></tr>
</table>
</td>
</tr>
</table>Lists
<ol>
<li>First item</li>
<li>Second item</li>
</ol>
<ul>
<li>Bullet point</li>
<li>Another point</li>
</ul>Centered / Aligned Text
<p style="text-align:center">Centered paragraph</p>
<p style="text-align:right">Right-aligned paragraph</p>
<p style="text-align:justify">Justified paragraph</p>UDF Format Reference
UDF files have this internal structure:
document.udf (ZIP archive)
└── content.xmlThe XML schema:
<?xml version="1.0" encoding="UTF-8" ?>
<template format_id="1.8">
<content><![CDATA[All document text here]]></content>
<properties>
<pageFormat mediaSizeName="1" leftMargin="42.52" rightMargin="28.35"
topMargin="14.17" bottomMargin="14.17" paperOrientation="1" />
</properties>
<elements resolver="hvl-default">
<paragraph Alignment="0">
<content startOffset="0" length="5" family="Times New Roman"
size="12" bold="true" foreground="-16777216" background="-1" />
</paragraph>
<table tableName="Sabit" columnCount="2" columnSpans="150,150" border="borderCell">
<row rowName="row1" rowType="dataRow" border="borderCell">
<cell colspan="1" align="top" fillColor="16777215"
border="borderCell" borderSpec="15">
<paragraph>...</paragraph>
</cell>
</row>
</table>
</elements>
</template>Key details:
- Text offsets are rune-based (Unicode codepoints), not byte-based
- Colors are signed 32-bit ARGB integers (black = -16777216, white = -1)
- Alignment: 0 = left, 1 = center, 2 = right, 3 = justify
- Border spec: bitmask (top=1, right=2, bottom=4, left=8; all sides=15)
- Images: base64-encoded in
imageDataattribute,\uFFFCplaceholder in text - Empty paragraphs:
\u200B(zero-width space) placeholder in text
Development
npm install
npm test # run tests
npm run build # compile TypeScript