A Python tool designed to reduce PDF file size by compressing and downscaling images within the PDF. This program uses PyMuPDF to access and manipulate PDF images and Pillow to resize and compress image quality. Ideal for reducing large PDF files containing high-resolution images, making them more manageable for storage, sharing, or web uploads.
- Image Downscaling and Compression: Reduces the resolution of images embedded within PDF pages based on a user-defined DPI (default 72 DPI), which results in smaller file sizes.
- JPEG Quality Adjustment: Enables control over JPEG quality (0-100), allowing users to balance between file size and image quality.
- Flexible Output: Saves the compressed file as a new PDF, preserving the layout and structure while optimizing for size.
- Easy Customization: Parameters for DPI and JPEG quality allow users to tailor the compression level according to their needs.
Dependencies:
- PyMuPDF (fitz): For accessing and manipulating PDF file structure, extracting and replacing images on each page.
- Pillow (PIL): For image manipulation, resizing, and quality adjustment.
Key Parameters:
- dpi: Controls the resolution for image downscaling (default is 72 DPI; higher values result in less compression).
- quality: JPEG compression quality (default is 50, with 0 being lowest quality and 100 being maximum).
Usage Example:
- Install dependencies with pip install pymupdf pillow.
- Run the compress_pdf function, specifying the input and output PDF paths, desired DPI, and quality.
- The output file will be significantly reduced in size, especially for image-heavy PDFs.
To compress a PDF, use the following code:
# Define paths and compression settings
input_pdf = "path/to/large_file.pdf"
output_pdf = "compressed_file.pdf"
desired_dpi = 72 # Lower DPI for higher compression
jpeg_quality = 50 # Lower quality for higher compression
# Compress the PDF
compress_pdf(input_pdf, output_pdf, dpi=desired_dpi, quality=jpeg_quality)