@lingo-reader/fb2-parser

v0.4.5

Published

2 months ago

An fb2 parser which can extract chapter contents from an fb2 file

0High
0Medium
0Low

fangfangluo

fb2 fb2-parser ebook ebook-parser

The FB2 format parser is based on references from MobileRead Wiki - FB2 and Eng:FictionBook description — FictionBook.

Overview

@lingo-reader/fb2-parser is a parser for .fb2 eBook files, designed to work in both browser and Node.js environments.

Installation

pnpm install @lingo-reader/fb2-parser

Fb2File

Usage in the Browser

import { initFb2File } from '@lingo-reader/fb2-parser'
import type { Fb2File, Fb2Spine } from '@lingo-reader/fb2-parser'

async function initFb2(file: File) {
  const fb2: Fb2File = await initFb2File(file)
  // Get the spine (chapter list)
  const spine: Fb2Spine = fb2.getSpine()
  // Load the first chapter
  const firstChapter = fb2.loadChapter(spine[0].id)
}

// See the Fb2File class for more API details

Usage in Node.js

import { initFb2File } from '@lingo-reader/fb2-parser'
import type { Fb2File, Fb2Spine } from '@lingo-reader/fb2-parser'

const fb2: Fb2File = await initFb2File('./example/many-languages.fb2')
// Get the spine (chapter list)
const spine: Fb2Spine = fb2.getSpine()
// Load the first chapter
const firstChapter = fb2.loadChapter(spine[0].id)

// See the Fb2File class for more API details

initFb2File(file: string | File | Uint8Array, resourceSaveDir?: string): Promise

Initializes and parses an FB2 file. It returns an Fb2File object containing metadata, spine info, and other helper APIs.

Parameters:

file: string | File | Uint8Array – The FB2 file to load. Can be a file path (Node), File object (Browser), or Uint8Array (both).
resourceSaveDir?: string – Optional. Used in Node.js to specify where to save resources like images. Defaults to ./images.

Returns:

Promise<Fb2File> – A promise that resolves to the parsed Fb2File instance.

Note: In the browser, file must be of type File or Uint8Array. In Node.js, file must be of type string or Uint8Array. Passing the wrong type will result in an error.

Mobi Class

class Mobi {
  getFileInfo(): FileInfo
  getSpine(): Fb2Spine
  loadChapter(id: string): Fb2ProcessedChapter | undefined
  getToc(): Fb2Toc
  getCoverImage(): string
  getMetadata(): Fb2Metadata
  resolveHref(fb2Href: string): Fb2ResolvedHref | undefined
  destroy(): void
}

getFileInfo(): FileInfo

interface MobiFileInfo {
  // FB2 file name including extension
  fileName: string
}

Returns basic file information. Currently, only the fileName is provided.

getSpine(): Fb2Spine

interface Fb2SpineItem {
  id: string // Chapter ID
}
type Fb2Spine = Fb2SpineItem[]

Returns an ordered list of chapters (the "spine"). Each item includes an id that can be passed to loadChapter to load that chapter.

loadChapter(id: string): Fb2ProcessedChapter | undefined

interface Fb2CssPart {
  id: string
  href: string
}
interface Fb2ProcessedChapter {
  html: string
  css: Fb2CssPart[]
}

Loads and processes a chapter by its ID. Returns a structured object containing html and css. Returns undefined if the chapter doesn't exist.

The stylesheet is extracted from FictionBook.stylesheet and shared across all chapters. When loading a chapter, the styles are stored as an object in Fb2ProcessedChapter.css. The id is fixed, while the href is a Blob URL in browser environments and a real file path in Node environments.

The original chapter files are stored in XML format, so converting them to HTML involves transforming both tags and resource paths. The only resources in the chapters are images, and their URLs are automatically converted in the final returned HTML.

Another element that gets transformed is the <a> tag used for internal navigation. The href attribute of these tags is automatically rewritten to a specific format. You can use fb2.resolveHref to parse the rewritten link and extract the corresponding chapter ID and DOM selector.

getToc(): Fb2Toc

interface Fb2TocItem {
  label: string // TOC item label
  href: string // Internal FB2 href
}
export type Fb2Toc = Fb2TocItem[]

Returns the table of contents. Each item has a label and internal href. Fb2Toc is not nested.

getCoverImage(): string

Returns the book's cover image. In the browser, it's a blob URL. In Node.js, it's a file path. If the FB2 file doesn't contain a CoverImage, an empty string is returned.

getMetadata(): MobiMetadata

interface Author {
  name: string // Full name
  firstName: string
  middleName: string
  lastName: string
  nickname?: string
  homePage?: string
  email?: string
}

interface MobiMetadata {
  // title-info
  title?: string
  type?: string
  author?: Author
  language?: string
  description?: string
  keywords?: string
  date?: string
  srcLang?: string // Source language if translated
  translator?: string

  // document-info
  id?: string
  programUsed?: string // Program used to generate the FB2 file
  srcUrl?: string // Original source URL of the content
  srcOcr?: string // OCR tool or indicator that OCR was used
  version?: string
  history?: string

  // publish-info
  bookName?: string
  publisher?: string
  city?: string
  year?: string
  isbn?: string
}

Returns the book’s metadata.

resolveHref(href: string): Fb2ResolvedHref | undefined

interface Fb2ResolvedHref {
  id: string // Chapter ID
  selector: string // DOM selector (usable with querySelector)
}

Resolves internal FB2 href values (e.g., from <a> tags) into a chapter ID and a DOM selector for the in-document anchor.

destroy(): void

Cleans up any created blob URLs or saved resources, to avoid memory leaks during parsing and rendering.