Linearization: LuraPDF uses pdf-lib to write a new PDF byte stream with the first page's objects at the beginning of the file, followed by a linearization dictionary that gives progressive-download readers a map to the rest of the content. This doesn't change what's in the PDF — it changes the order in which bytes are laid out on disk, enabling partial rendering before full download completes. Deduplication: The engine computes a content hash of every embedded image XObject. Objects with matching hashes are consolidated — the first instance is kept, and every subsequent reference in the page content stream is rewritten to point to the same shared object. The savings are proportional to how many times a given image was embedded separately.
Font subsetting: For each embedded font, LuraPDF analyzes which Unicode code points are actually referenced in the document's text streams. It then rebuilds the font's glyph table to contain only those code points, discarding the rest of the character set. For CJK (Chinese, Japanese, Korean) fonts that embed thousands of glyphs for documents using only a few hundred, the size reduction is dramatic. Unused object stripping: After deduplication and subsetting, the engine walks the PDF's cross-reference table and marks every object reachable from the document catalog. Unreachable objects — deleted pages, removed form fields, old revision snapshots — are excluded from the new file's xref, effectively removing them from the output.