A PDF file is a collection of objects (pages, fonts, images, annotations) indexed by a cross-reference (xref) table. When the xref table is corrupted — by a truncated download, a failed save, or storage damage — the reader can't find the objects and reports the file as unreadable. LuraPDF's repair engine uses pdf-lib with a custom low-level parser that ignores the broken xref and instead walks the raw byte stream, identifying object boundaries by their header signatures. From these discovered objects, it reconstructs a valid xref table and page tree, producing a new conformant PDF.
When the object stream itself is too fragmented for structural reconstruction, LuraPDF falls back to PDF.js in lenient mode, which tolerates malformed syntax and attempts to render or extract text from whatever content streams survive. Text-rescue mode captures text operators directly, bypassing glyph rendering when fonts are missing. The result is a plain-text or text-layer PDF that preserves the words even when layout information is lost. This layered approach — structure repair first, then page salvage, then text rescue — maximises recovery across the widest range of corruption types.