npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ppt-commitment-parser

v0.0.2

Published

將施政報告或施政綱要轉換成 CSV 的工具。

Downloads

8

Readme

ppt-commitment-parser

Build Status Coverage Status

將施政報告或施政綱要 PDF 轉換成 CSV 的工具。 PDF 檔必須要用公文書格式進行標號(階層依序為「壹、」「一、」「(一)」「1、」「(1)」「甲、 或 A.」,系統會處理半型全型的差異),方能處理。

Usage

CLI

$ commitment-parser <options> PDF檔名

範例與說明都寫在 ppt-parser --help

Library

import pdftojson from 'pdftojson';
import parser from 'ppt-commitment-parser';

pdftojson(PDF路徑).then(pdfData => {
  var parsedData = parser(pdfData, parserOption);
  // parsedData 格式請見下「陣列輸出」
})

Output Format

輸入 PDF 範例

桃園市議會第 1 屆第 1 次定期會的市長施政報告第 14 頁的某段文字為例:

PDF

CSV 輸出(CLI)

# 前 6 個固定是分層,第 7 個是頁碼,第 8、9 是左上點 x、y 坐標(pt),第 10 個是內文
# 坐標為最小的標題的坐標,以頁面左下角為原點。。
"政策規劃與執行","捷運城市","發展無縫公共運輸","推動捷運建設","","",14,80,454,"為配合//⋯⋯"
"政策規劃與執行","捷運城市","發展無縫公共運輸","推動捷運建設","航空城捷運線(綠線)","",14,109,556,"航空城捷運線(綠線)總長約//⋯⋯"
# ...

Array 輸出(Library)

[
  {
    number: 2,
    numberCH: '貳',
    text: '政策規劃與執行',
    page: 14,
    coord: [55, 373], // 頁面左下點為原點,單位:pt
    items: [
      {
        number: 1,
        numberCH: '一',
        text: "捷運城市",
        page: 14,
        coord: [55, 402],
        items: [
          {
            number: 1,
            numberCH: '(一)',
            text: '發展無縫公共運輸',
            page: 14,
            coord: [62, 427],
            items: [
              {
                number: 1,
                numberCH: '1',
                text: '推動捷運建設',
                page: 14,
                coord: [80, 454],
                items: [
                  // text-only
                  {
                    text: '為配合航空城發展/* ⋯⋯ */及優質適居的低碳生態環境。',
                    page: 14,
                    coord: [132, 480]
                  },
                  { // text with number
                    number: 1,
                    numberCH: '(1)',
                    text: '航空城捷運線(綠線)',
                    page: 14,
                    coord: [109, 556]
                    items: [ /* ... */ ]
                  }, // ...
                ]
              }
            ]
          }, //...
        ]
      }, //...
    ]
  }, // ...
]

Error Handling

pdftojson(PDF路徑, {onError: (errType, errPayload) => {}}) // returns a promise

onError callback 的 this 會被設成 LineMachine instance,因此可以存取 LineMachine 的 method (請見 LineMachine 的實作)。

errType === 'PARSE_NUM'

errPayload === {
  input // 中文數字
}

errPayload.input 無法被轉為數字。

errType === 'NUMBER_MISMATCH'

errPayload === {
  text, // 出錯的該行文字
  page, // 出錯文字所在頁碼
  number, // 出錯文字的標號(已轉數字)
  lastSiblingSection, // 同層前一 Section instance(若出錯的是該層第一項,則為 undefined)
}

各層標題應該要從 1 開始而且連續,若有數字不連續的狀況,就會觸發此 error。

errType === 'LEVEL_MISMATCH'

errPayload === {
  text, // 出錯的該行文字
  page, // 出錯文字所在頁碼
  coord, // 出錯文字所在頁面坐標(左下為原點,單位 pt)
  level, // 標題層級(-1 為頂層,0 為「壹、」,6 為「甲、」)
  lastLevel, // 前文標題層級
  numberCH // 出錯文字的標號
}

errPayload.text 的標題階層與所在之前文不符。

標題階層應依序為「壹、」、「一、」、「(一)」、「1、」、「(1)」、「甲、」六層。 若某一行突然向下太多層(例如說原本在「一、」,突然有一行用「1、」標號),就會觸發此 error。