Gemini Paper Summarizer

Overview

A command-line tool that uses Gemini API to generate summaries of academic papers.

Supports output in multiple languages: Chinese, English, French, German, Japanese, Korean, and Spanish.

Examples

See the examples directory for sample outputs, including summaries of:

(2017) Attention Is All You Need

Other Summaries in Japanese

(2018) What you can cram into a single vector: Probing sentence embeddings for linguistic properties
(2018) Improving language understanding by generative pre-training
(2018) BERT: Pre-training of deep bidirectional Transformers for language understanding
(2019) BERT rediscovers the classical NLP pipeline
(2019) Linguistic knowledge and transferability of contextual representations
(2019) A Structural Probe for Finding Syntax in Word Representations
(2019) What does BERT look at? An analysis of BERT’s attention
(2022) Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
(2023) A Benchmark for Learning to Translate a New Language from One Grammar Book
(2023) Text Embeddings Reveal (Almost) As Much As Text

Prerequisites

Python 3.10+
Gemini API Key (obtain from Google AI Studio)

Note: If you find it difficult to set up the environment locally, please refer to the following Google Colab Notebook by @shoei05 (explained in Japanese):

論文要約 gemini-paper-summarizer を Google Colab で使用する

Installation

Clone the repository
Install dependencies:
```
uv sync
```
Create a .env file in the project directory with your Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```

Usage

uv run gp-summarize path/to/paper.pdf

The tool will generate several markdown files with different types of summaries:

A translation of the paper's title and abstract
A summary of the entire paper
A JSON structure of the paper's chapters and sections
Individual summaries (not translations) for each main section
Summaries of 1-4 above combined into a single file

The output files will be named based on the input PDF filename. Files 1-4 above will be saved in a directory and numbered continuously (e.g., paper/001.md, paper/002.md, etc.). The combined file will be named paper.md.

Note: If the process is interrupted (e.g. by Ctrl+C or by a 429 rate limit error, etc.), the process can be re-run smoothly, because any existing output files will be skipped.

Output Format

For each prompt, the tool generates a markdown file containing:

Title (prompt number)
Statistics information about tokens
Prompt
AI-generated response

The section structure will be displayed in both JSON format and as a hierarchical list.

Command-Line Options

uv run gp_summarize [-h] [-d OUTPUT_DIR] [-o OUTPUT] [-l {de,en,es,fr,ja,ko,zh}] [-m MODEL] [--rpm RPM] [--suffix SUFFIX] [--ccache] [--version] pdf_paths [pdf_paths ...]

pdf_paths: Required Path(s) to one or more PDF files to summarize
- Multiple PDF files can be specified
- Wildcards (*) are supported on Windows
-d, --output-dir: Optional. Specify the output directory for intermediate files
- Recommended when processing multiple PDF files
-o, --output: Optional. Specify the output file for the summary
- Can only be used with a single PDF file
-l, --language: Optional. Specify the output language
- Supported languages: de (German), en (English), es (Spanish), fr (French), ja (Japanese), ko (Korean), zh (Chinese)
- Default: Based on system language settings
-m, --model: Optional. Specify the Gemini model to use
- Default: gemini-2.0-flash
--rpm: Optional. Set maximum requests per minute
- Default: 15
--suffix: Optional. Add custom suffix to the output filename
- Useful for differentiating the output filename from the source PDF
--ccache: Optional. Enable context caching
- Caches source PDF to reduce API usage costs
- Currently disabled by default as it's not supported in Gemini 2.0 Flash
--version: Display version information

Examples

Summarize a single PDF:

uv run gp_summarize paper.pdf

Summarize multiple PDFs:

uv run gp_summarize paper1.pdf paper2.pdf

Specify an output directory:

uv run gp_summarize -d ./outputs paper.pdf

Summarize in a specific language:

uv run gp_summarize -l ja paper.pdf

License

CC0 1.0 Universal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Gemini Paper Summarizer

Overview

Examples

Other Summaries in Japanese

Prerequisites

Installation

Usage

Output Format

Command-Line Options

Examples

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Gemini Paper Summarizer

Overview

Examples

Other Summaries in Japanese

Prerequisites

Installation

Usage

Output Format

Command-Line Options

Examples

License