PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Abstract
Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence. Existing methods primarily focus on improving and evaluating the content quality in isolation, often overlooking visual design and structural coherence, which limits their practical applicability. To address these limitations, we propose PPTAgent, which comprehensively improves presentation generation through a two-stage, edit-based approach inspired by human workflows. PPTAgent first analyzes reference presentations to understand their structural patterns and content schemas, then drafts outlines and generates slides through code actions to ensure consistency and alignment. To comprehensively evaluate the quality of generated presentations, we further introduce PPTEval, an evaluation framework that assesses presentations across three dimensions: Content, Design, and Coherence. Experiments show that PPTAgent significantly outperforms traditional automatic presentation generation methods across all three dimensions. The code and data are available at https://github.com/icip-cas/PPTAgent.
Community
Hi, Everyone
We proposed PPTAgent, a system for automatically generating presentations from documents. It follows a two-step process inspired by how people create slides, ensuring high-quality content, clear structure, and visually appealing design. To evaluate the generated presentations, we also introduce PPTEval, a framework that measures the quality of presentations in terms of content, design, and coherence.
Github Link, Dataset
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Multi-LLM Collaborative Caption Generation in Scientific Documents (2025)
- From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing (2024)
- AutoPresent: Designing Structured Visuals from Scratch (2025)
- ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges (2024)
- Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation (2024)
- TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization (2024)
- VISA: Retrieval Augmented Generation with Visual Source Attribution (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
news:
We've released our code of the workflow and UI at https://github.com/icip-cas/PPTAgent
Dataset: https://huggingface.co/datasets/Forceless/Zenodo10K
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper