Skip to main content
technical compression explainer

PDF Compression Explained — Why Your File Is Big and How to Shrink It

A 30-page Word doc exports to a 2 MB PDF. A 30-page scan exports to an 80 MB PDF. Why? And which compression setting do you actually want? Practical breakdown.

PDFShed TeamMay 7, 2026 5 min read

Why some PDFs are massive

A PDF is a container. What's inside drives the size:

  • Vector text + simple layout (Word export, LaTeX): 50–500 KB per page. Tiny.
  • Embedded fonts: add 100 KB–2 MB total per file (one-time, regardless of page count).
  • Embedded images: depends entirely on resolution.
  • Scanned pages: each page is a single big image. 1–5 MB per page typical at 300 DPI.

A 30-page Word export = 50 KB × 30 + 1 MB fonts ≈ 2.5 MB. A 30-page color scan = 3 MB × 30 ≈ 90 MB. Same number of pages, 36× the size.

The three things compression does

PDF compressors do some combination of these:

1. Downsample images. Convert 600 DPI to 150 DPI for screen viewing. Largest gain on scanned PDFs.

2. Recompress images. JPEG with lower quality, or convert PNG → JPEG where appropriate.

3. Subset fonts. If a font has 1000 glyphs and you use 50, embed only those 50.

4. Strip metadata, comments, unused objects. Marginal gain (5–15%).

That's it. There's no magical "make it smaller" — it's these levers.

What each compression preset does

Most tools (PDFShed included) offer four levels. Roughly:

  • Low: image downsample to 200 DPI, JPEG quality 80. Visual fidelity ~99%. Size reduction ~30–50%.
  • Medium: 150 DPI, JPEG 75. Fidelity ~95%. Size reduction ~50–70%.
  • High: 100 DPI, JPEG 60. Fidelity ~85%. Size reduction ~70–85%.
  • Aggressive: 72 DPI, JPEG 40. Fidelity ~70%. Size reduction ~85–95%. Visible artifacting.

Which level for which use case

  • Email attachment: High. Drops a 30 MB scan to 4–6 MB. Still readable.
  • USCIS / court filing (under 6 MB): High → Aggressive if needed. Required for legibility-tolerant uploads.
  • Print: Low. Don't recompress; you lose print quality.
  • Archive (long-term storage): Medium. Good middle ground.
  • WhatsApp (under 2 MB): Aggressive. Quality suffers, but it sends.

Why text-heavy PDFs barely compress

Text in a PDF is already compressed (Flate/zlib). Compression algorithms can't squeeze it further. A 50-page contract maybe drops 30%. Don't expect 80%.

Why scans compress so well

Scans are images, and most scanner outputs are over-quality (300+ DPI when 150 is plenty for legibility). Drop the DPI, recompress JPEG, and you're at 10–20% of original size. PDFShed's "High" compression typically achieves this in one click.

OCR before or after compression?

OCR converts image-only scans into searchable PDFs by adding a text layer. Order matters:

  • OCR first, then compress: preserves text-layer accuracy. Recommended.
  • Compress first, then OCR: lower DPI may hurt OCR accuracy. Avoid.

Tools

More posts

PDFShed

전문 PDF 도구 - 무료 & 프라이빗

Security

  • Client-side processingFiles never leave your device
  • No file uploads100% private & secure

Compliance

GDPR Compliant
100% 프라이빗 - 파일이 기기를 떠나지 않습니다
언어 선택

© 2026 PDFShed. All rights reserved.