@1-/md
v0.1.5
Published
Extract code blocks and languages from Markdown / 解析 Markdown 提取代码块位置与语言类型
Downloads
1,186
Maintainers
Readme
@1-/md : Extract code block positions and languages from Markdown
1. Features
Extract code block locations and language identifiers from Markdown text.
- Normalize cross-platform line endings (\r\n, \r) to standard \n
- Support fenced code blocks with any number of consecutive backticks (≥3)
- Accurately extract language identifiers, including empty strings for language-agnostic blocks
- Record start and end line numbers (1-based indexing) for each code block
- Automatically close unclosed code blocks at document end
2. Usage
import li from "@1-/md/li.js";
import code from "@1-/md/code.js";
const markdownContent = `# Title
\`\`\`javascript
const val = 1;
\`\`\``;
// Split Markdown text into normalized line array
const lines = li(markdownContent);
// Extract all code block information
const blocks = code(lines);
console.log(blocks);
// Output format: [ [ Language, Start Line, End Line ] ]
// Example output: [ [ 'javascript', 3, 5 ] ]3. Design
The system consists of two independent modules: line normalization (li.js) and code block extraction (code.js).
li.js normalizes cross-platform line endings and splits text into line arrays while trimming trailing whitespace from each line.
code.js uses a state machine to iterate through the line array:
- Initial state (outside code block): match lines starting with ≥3 backticks, record backtick count, language, and start line number, transition to inside state
- Inside code block state: match ending lines with identical backtick count, record end line number and save block information, transition back to outside state
- Document end: if still inside a code block, use total line count as end line number and save the unclosed block
4. Tech Stack
- Runtime: Bun / Node.js
- Language: JavaScript (ES Modules)
- Linter: Oxlint
- Formatter: Oxfmt
5. Code Structure
.
├── src/
│ ├── code.js # Code block extraction logic
│ └── li.js # Line normalization processing
└── test/
├── _.test.js # Unit test
└── test.md # Markdown test cases6. History
In 2004, John Gruber and Aaron Swartz co-created the Markdown markup language, with early specifications supporting only indented code blocks.
In 2012, GitHub introduced fenced code blocks in GitHub Flavored Markdown (GFM), using backticks (`) to wrap code with optional language identifiers, significantly improving readability and practicality of technical documentation.
In 2014, fenced code blocks were standardized in the CommonMark specification, becoming a core feature of modern Markdown implementations across documentation generators, static site builders, and development tools.
About
This library is developed by WebC.site.
WebC.site: A new paradigm of web development for AI
@1-/md : 解析 Markdown 提取代码块位置与语言类型
1. 功能介绍
提取 Markdown 文本中围栏代码块的位置信息与语言标识。
- 统一处理跨平台换行符(\r\n、\r)为标准 \n
- 支持任意数量连续反引号(≥3)定义的围栏代码块
- 准确提取代码块语言标识,包括无语言标识的空字符串情况
- 记录每个代码块的起始行号与结束行号(1-based indexing)
- 自动闭合文档末尾未配对的代码块
2. 使用演示
import li from "@1-/md/li.js";
import code from "@1-/md/code.js";
const markdownContent = `# Title
\`\`\`javascript
const val = 1;
\`\`\``;
// 将 Markdown 文本拆分为标准化的行数组
const lines = li(markdownContent);
// 提取所有代码块信息
const blocks = code(lines);
console.log(blocks);
// 输出格式: [ [ 语言类型, 起始行号, 结束行号 ] ]
// 示例输出: [ [ 'javascript', 3, 5 ] ]3. 设计思路
系统由两个独立模块组成:行标准化模块(li.js)和代码块提取模块(code.js)。
li.js 模块统一处理不同平台的换行符,将文本分割为行数组,并清除每行末尾空白字符。
code.js 模块使用状态机遍历行数组:
- 初始状态(代码块外):匹配以 ≥3 个反引号开头的行,记录反引号数量、语言类型及起始行号,切换至代码块内状态
- 代码块内状态:匹配相同数量反引号的结束行,记录结束行号并保存代码块信息,切换回代码块外状态
- 文档末尾:若仍处于代码块内状态,则以文档总行数作为结束行号,保存未闭合代码块
4. 技术栈
- 运行时: Bun / Node.js
- 语言: JavaScript (ES Modules)
- 代码检查: Oxlint
- 代码格式化: Oxfmt
5. 代码结构
.
├── src/
│ ├── code.js # 代码块提取逻辑
│ └── li.js # 行标准化处理
└── test/
├── _.test.js # 单元测试
└── test.md # 测试用 Markdown 样例6. 历史故事
2004 年,John Gruber 与 Aaron Swartz 共同创建 Markdown 标记语言,早期规范仅支持缩进方式定义代码块。
2012 年,GitHub 在 GitHub Flavored Markdown (GFM) 中引入围栏代码块语法,使用反引号(`)包裹代码并支持语言标识,大幅提升技术文档可读性与实用性。
2014 年,围栏代码块被纳入 CommonMark 规范,成为现代 Markdown 的标准特性,广泛应用于文档生成、静态网站构建和开发工具中。
关于
本库由 WebC.site 开发。
WebC.site : 面向人工智能的网站开发新范式
