@udx/mq
v1.1.3
Published
Markdown Query - jq for Markdown documents
Downloads
5
Readme
@udx/mq - Markdown Query
A powerful tool for querying and transforming markdown documents, designed as a companion to @udx/mcurl. Think of it as "jq for markdown" - a tool that lets you treat markdown as structured data.
Key Capabilities
- Clean Content Extraction: Pull narrative content without code blocks for cleaner analysis
- Structured Querying: Filter and transform markdown content like jq does for JSON
- Document Analysis: Generate actionable insights and understand document structure
- Format Conversion: Transform between JSON, markdown, and other formats
- Composability: Combine with other tools in Unix-style pipelines
Why Clean Content Extraction Matters
Code blocks in technical documents serve a crucial purpose for developers but act as "noise" when analyzing the narrative flow. By separating content from code, mq helps:
- Improve focus on conceptual information
- Extract cleaner summaries without code snippets
- Better identify key points and arguments
- Create more approachable versions of technical content
Installation
npm install -g @udx/mqUsage Examples
Extract Clean Content (No Code Blocks)
# Extract clean content without code blocks
mq --clean-content --input test/fixtures/test-code-blocks.md
# Filter content to include only h1 and h2 headings and their content
mq --clean-content=2 --input test/fixtures/complex-test.md
# Get clean content in JSON format
mq --clean-content --format json --input test/fixtures/test-code-blocks.mdBasic Query Operations
# Extract headings from a document (returns JSON structure by default)
mq --input test/fixtures/basic-test.md '.headings[]'
# Analyze document structure (returns formatted Markdown report)
mq --analyze --input test/fixtures/complex-test.md
# Generate a table of contents (returns Markdown TOC)
mq --input test/fixtures/test-document.md '.toc'
# Extract code blocks by language (returns JSON structure)
mq --language javascript --input test/fixtures/test-code-blocks.md
# Extract code content only in raw format
mq --language javascript --input test/fixtures/test-code-blocks.md | jq -r '.[0].content'
# Extract all images (returns JSON structure)
mq --input test/fixtures/test-images.md '.images[]'
# Extract first sentences from sections (returns text content)
mq --first-sentences 2 --input test/fixtures/test-sentences.mdPipe with mcurl
# Fetch web content and analyze it
mcurl https://udx.io | mq --analyze
# Fetch web content and extract key information
mcurl https://udx.io/work | mq --clean-content
# First analyze the overall structure of web content
mcurl https://udx.io/about | mq --analyzeComplex Queries
# Extract level 2 headings
mq --input test/fixtures/complex-test.md '.headings[] | select(.level == 2)'
# Extract links to specific domain
mq --input test/fixtures/test-document.md '.links[] | select(.href | contains("example"))'
# Extract code blocks and make them collapsible
mq --input test/fixtures/test-code-blocks.md --transform-code-blocksIntegration with curl and jq
One of the most powerful aspects of mq is its ability to integrate with curl, mcurl, and jq in Unix-style pipelines:
# Fetch a GitHub markdown file and extract headings
curl -s https://raw.githubusercontent.com/WordPress/wordpress-develop/HEAD/README.md | mq '.headings[]'
# Get content from a website and extract clean narrative content
mcurl https://udx.io/about | mq --clean-content
# Process markdown content and pipe to jq for further filtering
curl -s https://raw.githubusercontent.com/WordPress/wordpress-develop/HEAD/README.md | mq --clean-content --format json | \
jq '[.[] | select(.type=="heading" and .level == 1)]'
# Extract expertise data from UDX API using proper jq patterns
curl -s 'https://udx.io/wp-json/udx/v2/works/search?query=&page=1' | \
jq '.facets.expertise[] | select(.count > 10) | {name: .name, count: .count}'Advanced Features
Clean Content Extraction
The clean content extractor is one of mq's most powerful features for document analysis. It removes code blocks while preserving the document's narrative structure:
# Extract clean content without code blocks
mq --clean-content --input test/fixtures/test-code-blocks.md
# Limit extraction to specific heading levels (h1 and h2 only)
mq --clean-content=2 --input test/fixtures/complex-test.md
# Get JSON output for programmatic processing
mq --clean-content --format json --input test/fixtures/test-code-blocks.md | jq lengthBenefits of Clean Content Extraction
- Improved Analysis: Focus on the narrative without code noise
- Better Summarization: Generate more coherent summaries from technical content
- Hierarchical Understanding: Preserve document structure while filtering code
- Content Repurposing: Transform code-heavy tutorials into conceptual guides
- Incremental Content Processing: Extract varying amounts of content for different purposes
Advanced UDX API Examples
# Extract links from HTML content using mq
mcurl https://udx.io/about | mq '.links[0:5]'
# Extract clean content from a WordPress page for easier reading
mcurl https://udx.io/guidance | mq --clean-content
# First analyze page structure, then extract specific elements
mcurl https://udx.io/work | mq --analyzeApproach
Best Practices for Working with Markdown and APIs
Native Node.js Functions: Prefer using native Node.js functions for fetching API data rather than dedicated modules. For example:
// Using native Node.js rather than dedicated modules const https = require('https'); function fetchContent(url) { // Function fetches content from URL using native Node.js modules // Input: url - String URL to fetch // Output: Promise that resolves to response body return new Promise((resolve, reject) => { https.get(url, (res) => { let data = ''; res.on('data', (chunk) => { data += chunk; }); res.on('end', () => { resolve(data); }); }).on('error', reject); }); }Logging and Debugging: Always log API request metadata and response data for troubleshooting:
// Proper logging for API requests function logApiRequest(url, options, response) { // Log API request details when verbose mode is enabled // Input: url - request URL, options - request options, response - API response // Output: None, logs to console if (process.env.DEBUG || process.env.VERBOSE) { console.log(`[API Request] ${options.method || 'GET'} ${url}`); console.log(`[API Response] Status: ${response.statusCode}`); if (process.env.VERBOSE) { console.log(`[API Response Body] ${JSON.stringify(response.body).substring(0, 200)}...`); } } }Use Lodash for Complex Operations: Leverage Lodash for data transformations to improve readability and fault tolerance in your pipeline.
Progressive Enhancement Workflow:
- Start by analyzing content structure with
mq --analyze - Extract relevant sections with targeted selectors
- Process and transform with clean content extraction
- Format output appropriately for your use case
- Start by analyzing content structure with
Testing Strategy: Test your pipelines using REST API tools, Mocha for unit tests, or simple curl commands for verification.
Documentation: Add comprehensive function headers that explain purpose, inputs, and outputs for all custom operations.
Common Pipelines
# Extract content → Clean → Filter → Format as JSON
mcurl https://udx.io/about | mq --clean-content | mq --format json | jq 'length'
# Analyze content structure then target specific elements
mcurl https://udx.io/work | mq --analyze && mcurl https://udx.io/work | mq '.headings[0:5]'
# Process multiple sources with consistent transformations
for url in "udx.io/about" "udx.io/work" "udx.io/guidance"; do
echo "Processing $url"
mcurl https://$url | mq --clean-content=2 | wc -l
doneUDX API Integration Patterns
Mq can be used as part of a larger data processing pipeline, working alongside other tools like curl and jq:
# Use mq for HTML content processing
mcurl https://udx.io/work | mq --clean-content | grep "Cloud"
# Use curl+jq for JSON API processing (not mcurl!)
curl -s 'https://udx.io/wp-json/udx/v2/works/search?query=&page=1' | \
jq '.facets.expertise[] | select(.count > 10) | {name: .name, count: .count}'
# Get industry distribution with better formatting
curl -s 'https://udx.io/wp-json/udx/v2/works/search?query=&page=1' | \
jq '.facets.industries[] | select(.count > 5) | {name: .name, count: .count}'
# Pipeline: Extract content from UDX pages, clean it, then analyze structure
for page in "about" "work" "guidance"; do
mcurl "https://udx.io/$page" | mq --clean-content | mq --analyze | grep -i "headings"
done