paperajcli
v0.1.1
Published
Convert MS-Word files to latex
Readme
📄 Paperajcli: A Lightweight Bridge from Word to LaTeX
Microsoft Word remains the de facto tool for collaborative academic writing—whether you're drafting manuscripts with co‑authors or assembling a thesis with committee feedback. While Pandoc can convert Word documents into LaTeX, integrating journal or thesis templates and managing citations often becomes cumbersome. Paperajcli solves this gap by offering a simple, structured way to export Word sections directly into LaTeX components using pandoc.
Paperajcli works by detecting custom delimiters inside your .docx file and exporting each marked section into its own LaTeX file. For example, a Word document containing blocks like below:
<paperaj-introduction>
Introduction
This is the introduction section content.
</paperaj-introduction>
<paperaj-methods>
Methods
Methods go here...
</paperaj-methods>will produce clean, modular LaTeX files (e.g., introduction.tex, methods.tex) in an output directory of your choice. The formatting of these files will be preserved(e.g. H1 -> \section{} and H2 -> \subsection{}). These files can be seamlessly included in any LaTeX template using commands such as:
\input{myfolder/methods.tex}✨ Paperajcli preserves commonly used LaTeX commands—including \cite{}—so you can rely on native LaTeX citation workflows without extra tooling. This makes it ideal for users who prefer Zotero, JabRef, or other BibTeX‑based reference managers. Use this csl with Zotero to ensure compatibility with Pandoc's citation processing. After exporting your sections, simply add your .bib file to your LaTeX project and compile as usual using any citation package (e.g., natbib, biblatex) and style.
🚀 Recommended Workflow
- Clone a LaTeX project template from Overleaf.
- Run Paperajcli to export your Word sections into a directory inside the template.
- Insert each exported
.texfile into the appropriate location using\input{}. - Manage citations in Zotero and export your references as a
.bibfile. - Add the
.bibfile to your repository and push the project back to Overleaf. - Compile the document.
This workflow keeps the collaborative convenience of Word while giving you the precision, structure, and template‑compatibility of LaTeX—without the usual friction. 🎉 Please ⭐️ If you find this project useful!
Prerequisites
- Node.js: version 18 or higher.
- Pandoc: Must be installed and available in your system PATH.
- MacOS:
brew install pandoc - Windows:
choco install pandoc - Linux:
sudo apt-get install pandoc
- MacOS:
Usage
The primary command is latex.
# Syntax
npx paperajcli latex <input-file> <output-directory> [flags]
# Example
npx paperajcli latex tests/paperaj.docx output/Arguments
file: Path to the MS-Word (.docx) file to convert.outputDir: Directory where the resulting.texfiles andmedia/folder will be saved.
Flags
--dry-run(-d): Preview the actions (converting, splitting) without writing any files to disk. Useful for verifying section detection.--extract-media/--no-extract-media: Control media extraction from DOCX (default:true). Use--no-extract-mediato skip extracting images and other media files.--help: Show CLI help.
Integrating Generated Files
The tool regenerates modular LaTeX files (e.g., introduction.tex, methods.tex). You can include these in your master LaTeX template using:
\input{output/introduction}
\input{output/methods}The tool handles figure and table environments automatically based on the input document structure.
Post-Processing
The tool performs several post-processing operations on the generated LaTeX:
LaTeX Command Preservation
You can use LaTeX commands directly in your MS-Word document, and they will be preserved in the output:
\cite{reference}- Citations\href{url}{text}- Hyperlinks\ref{label}- Cross-references\label{name}- Labels- Math commands like
\frac{}{},\begin{equation}, etc.
These commands will be automatically un-escaped during conversion.
Figure and Table Handling
- Figure captions: Use format
Figure 1: Caption Textin Word- Add
: TWOCOLUMNfor two-column figures (figure*environment) - Add
: LATEXROTATEfor rotated figures (sidewaysfigureenvironment)
- Add
- Table captions: Use format
Table 1: Caption Textin Word - Cross-references: References like
Figure_1,Table_2,Appendix_Aare automatically converted to\ref{}commands
Special Character Handling
- Escaped braces
\{and\}are converted to regular braces et al.is automatically removed- Triple dashes
-/-/-are converted to em-dashes---
Testing
Run unit and integration tests:
npm testContributing
Pull requests are welcome! See CONTRIBUTING.md.
