Convert Markdown to Styled PDF
5 February 2026
The origin of this project is a classic software engineering pivot. While building MyScout (my autonomous AI career agent), I ran into a serious formatting bottleneck.
LLMs like Claude and Gemini are incredibly good at generating raw text in Markdown. It's clean, structured, and lightweight. The problem? When you try to convert an LLM-generated Markdown file directly into a PDF to submit as a job application, it looks like a generic, unstyled notepad document. I tried existing conversion tools, but they offered zero control over the margins, fonts, and specific typographic branding I needed.
I needed absolute, programmatic control over the document's design without forcing the LLM to write complex formatting code. The solution was to build PyMD2PDF, a custom Python engine that uses LaTeX as a stylistic middleman.
Today, this tool doesn't just generate my resumes; it is the core rendering engine I use to generate highly polished, enterprise business quotes for my AI automation company, Matavex.
⚙️ The Architecture Pipeline
PyMD2PDF operates on a strict separation of concerns: Content vs. Presentation. You write the content in simple Markdown. The Python engine handles the design.
Here is the two-stage conversion pipeline:
[Raw Markdown]
↓
(Pandoc via PyPandoc) + [LaTeX Style Template]
↓
[.tex Intermediate File]
↓
(pdflatex / xelatex)
↓
[Highly Styled PDF]
By intercepting the process at the .tex intermediate stage, the engine allows me to inject custom LaTeX templates (resume_modern.latex, report_classic.latex). This means I can radically change the entire design of a document via a single CLI flag, without ever touching the source text.
🛠️ Key Technical Implementations
1. Robust CLI & Subprocess Management
I built the interface using Typer for modern, type-hinted CLI commands. Since LaTeX compilation is notorious for throwing massive, cryptic error logs, I engineered a custom subprocess handler (_extract_latex_errors()). Instead of crashing and dumping 500 lines of raw LaTeX compiler output into the terminal, the Python script parses the .log file, identifies the exact fatal error, and surfaces a clean, human-readable exception to the user.
2. Dependency Injection & State Validation
Because this tool relies on heavy external binaries (Pandoc and MiKTeX/TeX Live), I implemented a strict check_dependencies() pre-flight sequence. Using shutil.which(), the engine dynamically verifies that the correct compilers are in the system PATH before attempting any file I/O, providing clear installation instructions if they are missing.
3. The Style Template System
The core magic lies in the StyleManager. It scans a dedicated /styles/ directory for .latex files. These files act as master blueprints containing all the complex margin geometry, font packages, and color definitions.
Using Pandoc's variable substitution, the template waits for the Markdown content:
% The LaTeX Template Engine
$if(title)$
\title{$title$}
$endif$
\begin{document}
% The raw Markdown gets injected perfectly here:
$body$
\end{document}
🚀 Why LaTeX? (The Matavex Advantage)
Using LaTeX as the rendering engine might seem like overkill for a resume, but it is the ultimate flex for programmatic document generation.
Because LaTeX is fundamentally a typesetting programming language, the alignment is mathematically perfect. When I am generating a $15,000 architectural automation quote for a Matavex client, I can pass a raw Markdown file containing the project scope into PyMD2PDF, flag it with --style matavex_standard, and out pops a PDF with flawless corporate branding, custom accent colors, and pixel-perfect tables.
It is the perfect bridge between LLM-generated raw text and enterprise-grade document delivery.