PDF Compression Explained — Why Your File Is Big and How to Shrink It
A 30-page Word doc exports to a 2 MB PDF. A 30-page scan exports to an 80 MB PDF. Why? And which compression setting do you actually want? Practical breakdown.
Why some PDFs are massive
A PDF is a container. What's inside drives the size:
- Vector text + simple layout (Word export, LaTeX): 50–500 KB per page. Tiny.
- Embedded fonts: add 100 KB–2 MB total per file (one-time, regardless of page count).
- Embedded images: depends entirely on resolution.
- Scanned pages: each page is a single big image. 1–5 MB per page typical at 300 DPI.
A 30-page Word export = 50 KB × 30 + 1 MB fonts ≈ 2.5 MB. A 30-page color scan = 3 MB × 30 ≈ 90 MB. Same number of pages, 36× the size.
The three things compression does
PDF compressors do some combination of these:
1. Downsample images. Convert 600 DPI to 150 DPI for screen viewing. Largest gain on scanned PDFs.
2. Recompress images. JPEG with lower quality, or convert PNG → JPEG where appropriate.
3. Subset fonts. If a font has 1000 glyphs and you use 50, embed only those 50.
4. Strip metadata, comments, unused objects. Marginal gain (5–15%).
That's it. There's no magical "make it smaller" — it's these levers.
What each compression preset does
Most tools (PDFShed included) offer four levels. Roughly:
- Low: image downsample to 200 DPI, JPEG quality 80. Visual fidelity ~99%. Size reduction ~30–50%.
- Medium: 150 DPI, JPEG 75. Fidelity ~95%. Size reduction ~50–70%.
- High: 100 DPI, JPEG 60. Fidelity ~85%. Size reduction ~70–85%.
- Aggressive: 72 DPI, JPEG 40. Fidelity ~70%. Size reduction ~85–95%. Visible artifacting.
Which level for which use case
- Email attachment: High. Drops a 30 MB scan to 4–6 MB. Still readable.
- USCIS / court filing (under 6 MB): High → Aggressive if needed. Required for legibility-tolerant uploads.
- Print: Low. Don't recompress; you lose print quality.
- Archive (long-term storage): Medium. Good middle ground.
- WhatsApp (under 2 MB): Aggressive. Quality suffers, but it sends.
Why text-heavy PDFs barely compress
Text in a PDF is already compressed (Flate/zlib). Compression algorithms can't squeeze it further. A 50-page contract maybe drops 30%. Don't expect 80%.
Why scans compress so well
Scans are images, and most scanner outputs are over-quality (300+ DPI when 150 is plenty for legibility). Drop the DPI, recompress JPEG, and you're at 10–20% of original size. PDFShed's "High" compression typically achieves this in one click.
OCR before or after compression?
OCR converts image-only scans into searchable PDFs by adding a text layer. Order matters:
- OCR first, then compress: preserves text-layer accuracy. Recommended.
- Compress first, then OCR: lower DPI may hurt OCR accuracy. Avoid.
Tools
- Compress PDF — main tool, four presets
- OCR PDF — make scans searchable before compressing
- Compress for email — guide with email-specific size targets