The 5th New Frontiers in Summarization Workshop
EMNLP 2025
The Fifth Workshop on “New Frontiers in Summarization” aims to foster cross-fertilization of ideas in automatic summarization and related fields. It will cover novel paradigms, shared tasks, applied research, and future directions while accelerating the development of tools, datasets, and resources to meet the summarization needs of academia, industry, and government. As advances in natural language processing (e.g., pre-trained models and prompt-based learning) improve summarization performance, challenges remain in areas such as trustworthiness, interpretability, evaluation reliability, and the integration of knowledge and modalities for real-world deployment.
To tackle these challenges, we plan to expand the workshop’s scope beyond traditional summarization to include grounded text generation with retrieval, reference- and attribute-based summarization, multi-modal and long-form summarization, query-focused approaches, hallucination reduction, efficiency, and novel evaluation methods. This broader focus, particularly addressing the growing role of large language models (LLMs), is expected to attract wider engagement from the research community and push the boundaries of summarization research.
Keynote Speakers
Alexander R. Fabbri
Scale
Jey Han Lau
University of Melbourne
Schedule
Saturday, November 8, 2025
| Time | Event & Details |
|---|---|
| 08:50 - 09:00 | Opening Remarks |
| 09:00 - 09:45 | Keynote I - Mohit Bansal (UNC Chapel Hill) Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval |
| 09:45 - 10:30 | Keynote II - Greg Durrett (New York University) Specializing LLMs for Reliability |
| 10:30 - 11:00 | Coffee Break |
| 11:00 - 11:45 | Keynote III - Arman Cohan (Yale University) Evaluations of Non-Verifiable Tasks: From Scientific Literature Reviews to Meta-evaluation for General Alignment |
| 11:45 - 12:30 | Keynote IV - Alexander R. Fabbri (Scale) Summarization as an Evaluation Substrate: Grounding, Judges, and Multilinguality |
| 12:30 - 14:00 | Lunch Break |
| 14:00 - 15:30 | Lightning Talks + Poster Session (In-person/Virtual: Gathertown) (Workshop papers, Slides) |
| 15:30 - 16:00 | Coffee Break |
| 16:00 - 16:45 | Keynote III - Jey Han Lau (University of Melbourne) Ten Years of Abstractive Summarisation: A Whirlwind Tour and Future Directions |
| 16:45 - 16:50 | Final Remarks |
Keynote Talks
Mohit Bansal
UNC Chapel Hill
Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval
TBD
Arman Cohan
Yale University
Evaluations of Non-Verifiable Tasks: From Scientific Literature Reviews to Meta-evaluation for General Alignment
In this talk, I will present two of our recent works on examining evaluation for non-verifiable tasks through human preferences on scientific literature tasks and evaluations of LLMs as judges. First, I will focus on scientific literature reviews as a form of a summarization tasks where we present SciArena, an open platform where researchers compare literature-grounded, long-form answers with citations. I will also discuss SciArena-Eval, a benchmark that tests models as evaluators of literature-based responses. Our findings suggest that even the strongest evaluators reach about 65 percent accuracy against expert judgments, which highlights the difficulty of automated assessment for literature review style generation. Next, I will discuss AlignEval, which studies when judging ability tracks alignment. The work measures generation–evaluation consistency and reports high rank correlation under strong oracle and curated instance settings, for example Spearman 0.97 on Arena-Hard. Combined with instruction-following checks, our approach correlates with human preference leaderboards at about 0.94 Spearman while reducing evaluation cost. Our results provide practical guidance for evaluating literature review generation as a complex synthesis task and for auditing LLM judges used in alignment studies, including data curation, instance filtering, and risk of self-preference in evaluators.
Greg Durrett
New York University
Specializing LLMs for Reliability
Large language models (LLMs) generally employ capabilities long sought from summarization systems: they can synthesize information from multiple sources, derive new conclusions, and explain those conclusions to their users. However, LLMs do not always do this reliably. They hallucinate facts, convincingly state incorrect deductions, and exhibit logical fallacies like confirmation bias. In this talk, I will describe my lab's work on making LLM systems reliable by carefully evaluating their outputs in a fine-grained way. First, I will describe the ingredients of effective automated evaluators and a state-of-the-art factuality evaluation system, MiniCheck, showing that analyzing the nature of hallucinations can help reduce them. Second, I will describe how to evaluate LLM responses according to a broader set of criteria. Our system, EvalAgent, retrieves instructional documents from the web describing how to perform writing tasks, illuminating dimensions of evaluation that we as system developers may not have even been aware of. Together, these approaches provide a method for building more reliable LLM systems for open-ended writing tasks.
Alexander R. Fabbri
Scale
Summarization as an Evaluation Substrate: Grounding, Judges, and Multilinguality
I will frame summarization as an evaluation substrate for modern LLMs rather than merely a generation task. I will begin with grounded summarization in citation-required settings and recent RAG grounding benchmarks to show why claims must be auditable and robust. I will then highlight what breaks in multi-turn interactions, motivating trajectory-aware evaluation. Next, I will examine LLM as judge reliability, synthesizing large-scale evidence on judge variance and offering practical judging protocols. Finally, I will discuss native multilingual robustness through a non-summarization task, showing that translation can mask errors, while native phenomena such as idioms and culturally anchored facts reveal them. I will close with actionable recommendations for robust summarization and evaluation and outline remaining challenges.
Jey Han Lau
University of Melbourne
Ten Years of Abstractive Summarisation: A Whirlwind Tour and Future Directions
In this talk, I’ll discuss the development of models and evaluation metrics for abstractive summarisation over the past decade, highlighting influential papers across different eras and interweaving some of our own contributions along the way. Each era has had a distinct focus - for example, early neural models focused on finding effective architectures; the pretrained model era on optimising training objectives; and the current LLM era on prompt engineering. I’ll also touch on summarisation evaluation, which, unlike modelling, hasn’t evolved as dramatically. I’ll conclude by sharing some reflections on the future of summarisation as a research field.
Call for Papers
Both long paper (up to 8 pages with unlimited reference) and short paper (up to 4 pages with unlimited reference) are welcomed for submission!
A list of topics relevant to this workshop (but not limited to):
- Abstractive, extractive, and hybrid summarization
- Summarization with pre-trained large models
- Zero-shot/few-shot summarization
- Long-context summarization
- Fairness in summarization: faithfulness, bias, toxicity, and privacy-preserving methods
- Interpretability, controllability, and visualization of summarization systems
- Reference- and attribute-based summarization
- Query-focused summarization
- Knowledge-injected summarization with retrieval
- Multilingual summarization
- Multimodal summarization (text, speech, image, video)
- Multi-genre summarization (news, tweets, product reviews, conversations, medical records, etc.)
- Semantic aspects of summarization (representation, inference, validity)
- Cognitive and psycholinguistic aspects (readability, usability)
- Development of new algorithms, datasets, and annotations
- Development of new evaluation metrics
- Hallucination reduction and trustworthiness in summarization
- Efficiency in summarization and large model inference
- Survey papers reviewing summarization methods, benchmarks, or evaluation techniques
- Position papers presenting opinions, critiques, or perspectives on summarization research
Submission Instructions
You are invited to submit your papers in our OpenReview submission portal. All the submitted papers have to be anonymous for double-blind review. The content of the paper should not be longer than 8 pages for long papers and 4 pages for short papers, strictly following the ACL style templates, with the mandatory limitation section not counting towards the page limit. Supplementary and appendices (either as separate files or appended after the main submission) are allowed. We encourage code link submissions for the camera-ready version.
Dual Submission
NewSumm 2025 will allow double submission as long as the authors make a decision before camera-ready. We will not consider any paper that overlaps significantly in content or results with papers that will be (or have been) published elsewhere. Authors submitting more than one paper to NewSumm 2025 must ensure that their submissions do not overlap significantly (>25%) with each other in content or results. Authors can submit up to 100 MB of supplementary materials separately. Authors are highly encouraged to submit their codes for reproducibility purposes.
Fast-Track Submission
ACL Rolling Review (ARR) Submissions: Our workshop also welcomes submissions from ARR. Authors of any papers that are submitted to ARR and have their meta review ready may submit their papers and reviews for consideration for the workshop until 22 August 2025. The decision of publication will be announced by 10 September 2025. The commitment should be done via the workshop commitment submission website: OpenReview submission portal
Non-archival Option
ACL workshops are traditionally archival. To allow dual submission of work, we are also including a non-archival track. Authors have the flexibility to submit their unpublished research in a non-archival format, where only the abstract will be included in the conference proceedings. These non-archival submissions are expected to meet the same quality criteria as their archival counterparts and will undergo an identical review process. This option is designed to facilitate future publication opportunities in journals or conferences that disallow previously archived material. It also aims to foster engagement and constructive feedback on well-developed but yet-to-be-published work. Like archival submissions, non-archival entries must conform to the established formatting and length guidelines.
Important Dates:
-
Aug.
1522, 2025: Workshop Submission Due Date -
Aug. 22, 2025: Fast-Track Submission and ARR Commitment Deadline
-
Sep.
1017, 2025: Notification of Acceptance (Direct, ARR, and Fast-Track Notification) -
Sep.
1421, 2025: Camera-ready Papers Due -
Nov. 8, 2025: Workshop Date
Accepted Papers
- AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization
Mukur Gupta, Nikhil Reddy Varimalla, Nicholas Deas, Melanie Subbiah, Kathleen McKeown - Bridging Multimodal and Video Summarization: A Unified Survey
Haopeng Zhang - Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao, Xiang Zhang, Raymond Li, Jiaqi Wei, Chuyuan Li, Shafiq Joty, Giuseppe Carenini - HalluTree: Explainable Multi-Hop Hallucination Detection for Abstractive Summarization
Daniel Orshansky, Oskar Oomen, Naaisha Agarwal, Ryan Lagasse - From Keyterms to Context: Exploring Topic Description Generation in Scientific Corpora
Pierre Achkar, Satiyabooshan Murugaboopathy, Anne Kreuter, Tim Gollub, Martin Potthast, Yuri Campbell - DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
Xue-Yong Fu, Elena Khasanova, Md Tahmid Rahman Laskar, Harsh Saini, SHASHI BHUSHAN TN - REFER: Mitigating Bias in Opinion Summarisation via Frequency Framed Prompting
Nannan Huang, Haytham M. Fayek, Xiuzhen Zhang - Improving Aspect-Based Summarization via Contrastive Learning with Anchored Negative Examples
Elizabeth Palmieri, Yangfeng Ji - Beyond Paraphrasing: Analyzing Summarization Abstractiveness and Reasoning
Nathan Zeweniuk, Ori Ernst, Jackie CK Cheung - CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models
Sathya Krishnan Suresh, Tanmay Surana, Lim Zhi Hao, Eng Siong Chng - Hierarchical Attention Adapter for Abstractive Dialogue Summarization
Raymond Li, Chuyuan Li, Gabriel Murray, Giuseppe Carenini - LLM-as-a-Judge Failures at Automating the Identification of Poor Quality Outputs in Free-Form Texts
Zongxia Li, Xiyang Wu, Ishani Mondal, Alexa Siu, Jordan Lee Boyd-Graber, Ani Nenkova - QA-prompting: Improving Summarization with Large Language Models using Question-Answering
Neelabh Sinha - NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
Delip Rao, Weiqiu You, Eric Wong, Chris Callison-Burch
Organizers
Yue Dong
University of California, Riverside, USA
Wen Xiao
Microsoft Azure AI, Canada
Haopeng Zhang
University of Hawaii at Manoa, USA
Rui Zhang
Penn State University, USA
Ori Ernst
McGill University & Mila, Canada
Lu Wang
University of Michigan, USA
Fei Liu
Emory University, USA
Program Committee
- Shmuel Amar (Bar-Ilan University)
- Florian Boudin (JFLI, Nantes Université)
- Avi Caciularu (Google)
- Arie Cattan (Bar-Ilan University)
- Hou Pong Chan (Alibaba DAMO Academy)
- Khaoula Chehbouni (McGill University, Mila)
- Ziling Cheng (McGill University & Mila)
- Jackie Cheung (Mila / McGill)
- Maxime Darrin (Mistral AI)
- Felice Dell'Orletta (Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC))
- Ron Eliav (Bar-Ilan University)
- Tobias Falke (Amazon AGI)
- Lorenzo Flores (MILA Quebec)
- Yu Fu (University of California, Riverside)
- Eran Hirsch (Bar-Ilan University)
- Zhe Hu (The Hong Kong Polytechnic University)
- Xinyu Hua (Bloomberg)
- Patrick Huber (Meta)
- Hayate Iso (Megagon Labs)
- Ayal Klein (Bar Ilan University)
- Wojciech Kryscinski (Cohere)
- Elena Lloret (University of Alicante)
- Margot Mieskes (University of Applied Sciences, Darmstadt)
- Manabu Okumura (Tokyo Institute of Technology)
- Jessica Ouyang (UT Dallas)
- G M Shahariar (University of California, Riverside)
- Haz Sameen Shahgir (University of California Riverside)
- Ori Shapira (OriginAI)
- Aviv Slobodkin (Bar Ilan University )
- Cesare Spinoso (McGill )
- Esaú Villatoro Tello (Idiap Research Institute, CH)
- David Wan (UNC Chapel Hill)
- Haohan Yuan (ALOHA Lab, University of Hawaii at Manoa)
- Yusen Zhang (Penn State University )
- Nan Zhang (The Pennsylvania State University)
- Shiyue Zhang (Bloomberg)
- Ming Zhong (UIUC)
- Xiyuan Zou (McGill / MILA)