NewSumm Workshop 2025

The 5th New Frontiers in Summarization Workshop

EMNLP 2025

The Fifth Workshop on “New Frontiers in Summarization” aims to foster cross-fertilization of ideas in automatic summarization and related fields. It will cover novel paradigms, shared tasks, applied research, and future directions while accelerating the development of tools, datasets, and resources to meet the summarization needs of academia, industry, and government. As advances in natural language processing (e.g., pre-trained models and prompt-based learning) improve summarization performance, challenges remain in areas such as trustworthiness, interpretability, evaluation reliability, and the integration of knowledge and modalities for real-world deployment.

To tackle these challenges, we plan to expand the workshop’s scope beyond traditional summarization to include grounded text generation with retrieval, reference- and attribute-based summarization, multi-modal and long-form summarization, query-focused approaches, hallucination reduction, efficiency, and novel evaluation methods. This broader focus, particularly addressing the growing role of large language models (LLMs), is expected to attract wider engagement from the research community and push the boundaries of summarization research.

Keynote Speakers

Mohit Bansal
UNC Chapel Hill

Arman Cohan
Yale University

Greg Durrett
New York University

Alexander R. Fabbri
Scale

Jey Han Lau
University of Melbourne

Schedule

Saturday, November 8, 2025

Time	Event & Details
08:50 - 09:00	Opening Remarks
09:00 - 09:45	Keynote I - Mohit Bansal (UNC Chapel Hill) Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval
09:45 - 10:30	Keynote II - Greg Durrett (New York University) Specializing LLMs for Reliability
10:30 - 11:00	Coffee Break
11:00 - 11:45	Keynote III - Arman Cohan (Yale University) Evaluations of Non-Verifiable Tasks: From Scientific Literature Reviews to Meta-evaluation for General Alignment
11:45 - 12:30	Keynote IV - Alexander R. Fabbri (Scale) Summarization as an Evaluation Substrate: Grounding, Judges, and Multilinguality
12:30 - 14:00	Lunch Break
14:00 - 15:30	Lightning Talks + Poster Session (In-person/Virtual: Gathertown) (Workshop papers, Slides)
15:30 - 16:00	Coffee Break
16:00 - 16:45	Keynote III - Jey Han Lau (University of Melbourne) Ten Years of Abstractive Summarisation: A Whirlwind Tour and Future Directions
16:45 - 16:50	Final Remarks

Keynote Talks

Mohit Bansal

UNC Chapel Hill

Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval

TBD

Arman Cohan

Yale University

Evaluations of Non-Verifiable Tasks: From Scientific Literature Reviews to Meta-evaluation for General Alignment

In this talk, I will present two of our recent works on examining evaluation for non-verifiable tasks through human preferences on scientific literature tasks and evaluations of LLMs as judges. First, I will focus on scientific literature reviews as a form of a summarization tasks where we present SciArena, an open platform where researchers compare literature-grounded, long-form answers with citations. I will also discuss SciArena-Eval, a benchmark that tests models as evaluators of literature-based responses. Our findings suggest that even the strongest evaluators reach about 65 percent accuracy against expert judgments, which highlights the difficulty of automated assessment for literature review style generation. Next, I will discuss AlignEval, which studies when judging ability tracks alignment. The work measures generation–evaluation consistency and reports high rank correlation under strong oracle and curated instance settings, for example Spearman 0.97 on Arena-Hard. Combined with instruction-following checks, our approach correlates with human preference leaderboards at about 0.94 Spearman while reducing evaluation cost. Our results provide practical guidance for evaluating literature review generation as a complex synthesis task and for auditing LLM judges used in alignment studies, including data curation, instance filtering, and risk of self-preference in evaluators.

Greg Durrett

New York University

Specializing LLMs for Reliability

Large language models (LLMs) generally employ capabilities long sought from summarization systems: they can synthesize information from multiple sources, derive new conclusions, and explain those conclusions to their users. However, LLMs do not always do this reliably. They hallucinate facts, convincingly state incorrect deductions, and exhibit logical fallacies like confirmation bias. In this talk, I will describe my lab's work on making LLM systems reliable by carefully evaluating their outputs in a fine-grained way. First, I will describe the ingredients of effective automated evaluators and a state-of-the-art factuality evaluation system, MiniCheck, showing that analyzing the nature of hallucinations can help reduce them. Second, I will describe how to evaluate LLM responses according to a broader set of criteria. Our system, EvalAgent, retrieves instructional documents from the web describing how to perform writing tasks, illuminating dimensions of evaluation that we as system developers may not have even been aware of. Together, these approaches provide a method for building more reliable LLM systems for open-ended writing tasks.

Alexander R. Fabbri

Scale

Summarization as an Evaluation Substrate: Grounding, Judges, and Multilinguality

I will frame summarization as an evaluation substrate for modern LLMs rather than merely a generation task. I will begin with grounded summarization in citation-required settings and recent RAG grounding benchmarks to show why claims must be auditable and robust. I will then highlight what breaks in multi-turn interactions, motivating trajectory-aware evaluation. Next, I will examine LLM as judge reliability, synthesizing large-scale evidence on judge variance and offering practical judging protocols. Finally, I will discuss native multilingual robustness through a non-summarization task, showing that translation can mask errors, while native phenomena such as idioms and culturally anchored facts reveal them. I will close with actionable recommendations for robust summarization and evaluation and outline remaining challenges.

Jey Han Lau

University of Melbourne

Ten Years of Abstractive Summarisation: A Whirlwind Tour and Future Directions

In this talk, I’ll discuss the development of models and evaluation metrics for abstractive summarisation over the past decade, highlighting influential papers across different eras and interweaving some of our own contributions along the way. Each era has had a distinct focus - for example, early neural models focused on finding effective architectures; the pretrained model era on optimising training objectives; and the current LLM era on prompt engineering. I’ll also touch on summarisation evaluation, which, unlike modelling, hasn’t evolved as dramatically. I’ll conclude by sharing some reflections on the future of summarisation as a research field.

Call for Papers

Both long paper (up to 8 pages with unlimited reference) and short paper (up to 4 pages with unlimited reference) are welcomed for submission!

A list of topics relevant to this workshop (but not limited to):

Abstractive, extractive, and hybrid summarization
Summarization with pre-trained large models
Zero-shot/few-shot summarization
Long-context summarization
Fairness in summarization: faithfulness, bias, toxicity, and privacy-preserving methods
Interpretability, controllability, and visualization of summarization systems
Reference- and attribute-based summarization
Query-focused summarization
Knowledge-injected summarization with retrieval
Multilingual summarization
Multimodal summarization (text, speech, image, video)
Multi-genre summarization (news, tweets, product reviews, conversations, medical records, etc.)
Semantic aspects of summarization (representation, inference, validity)
Cognitive and psycholinguistic aspects (readability, usability)
Development of new algorithms, datasets, and annotations
Development of new evaluation metrics
Hallucination reduction and trustworthiness in summarization
Efficiency in summarization and large model inference
Survey papers reviewing summarization methods, benchmarks, or evaluation techniques
Position papers presenting opinions, critiques, or perspectives on summarization research

Submission Instructions

You are invited to submit your papers in our OpenReview submission portal. All the submitted papers have to be anonymous for double-blind review. The content of the paper should not be longer than 8 pages for long papers and 4 pages for short papers, strictly following the ACL style templates, with the mandatory limitation section not counting towards the page limit. Supplementary and appendices (either as separate files or appended after the main submission) are allowed. We encourage code link submissions for the camera-ready version.

Dual Submission

NewSumm 2025 will allow double submission as long as the authors make a decision before camera-ready. We will not consider any paper that overlaps significantly in content or results with papers that will be (or have been) published elsewhere. Authors submitting more than one paper to NewSumm 2025 must ensure that their submissions do not overlap significantly (>25%) with each other in content or results. Authors can submit up to 100 MB of supplementary materials separately. Authors are highly encouraged to submit their codes for reproducibility purposes.

Fast-Track Submission

ACL Rolling Review (ARR) Submissions: Our workshop also welcomes submissions from ARR. Authors of any papers that are submitted to ARR and have their meta review ready may submit their papers and reviews for consideration for the workshop until 22 August 2025. The decision of publication will be announced by 10 September 2025. The commitment should be done via the workshop commitment submission website: OpenReview submission portal

Non-archival Option

ACL workshops are traditionally archival. To allow dual submission of work, we are also including a non-archival track. Authors have the flexibility to submit their unpublished research in a non-archival format, where only the abstract will be included in the conference proceedings. These non-archival submissions are expected to meet the same quality criteria as their archival counterparts and will undergo an identical review process. This option is designed to facilitate future publication opportunities in journals or conferences that disallow previously archived material. It also aims to foster engagement and constructive feedback on well-developed but yet-to-be-published work. Like archival submissions, non-archival entries must conform to the established formatting and length guidelines.

Important Dates:

Aug. 15 22, 2025: Workshop Submission Due Date
Aug. 22, 2025: Fast-Track Submission and ARR Commitment Deadline
Sep. 10 17, 2025: Notification of Acceptance (Direct, ARR, and Fast-Track Notification)
Sep. 14 21, 2025: Camera-ready Papers Due
Nov. 8, 2025: Workshop Date

Accepted Papers

AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization
Mukur Gupta, Nikhil Reddy Varimalla, Nicholas Deas, Melanie Subbiah, Kathleen McKeown
Bridging Multimodal and Video Summarization: A Unified Survey
Haopeng Zhang
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao, Xiang Zhang, Raymond Li, Jiaqi Wei, Chuyuan Li, Shafiq Joty, Giuseppe Carenini
HalluTree: Explainable Multi-Hop Hallucination Detection for Abstractive Summarization
Daniel Orshansky, Oskar Oomen, Naaisha Agarwal, Ryan Lagasse
From Keyterms to Context: Exploring Topic Description Generation in Scientific Corpora
Pierre Achkar, Satiyabooshan Murugaboopathy, Anne Kreuter, Tim Gollub, Martin Potthast, Yuri Campbell
DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
Xue-Yong Fu, Elena Khasanova, Md Tahmid Rahman Laskar, Harsh Saini, SHASHI BHUSHAN TN
REFER: Mitigating Bias in Opinion Summarisation via Frequency Framed Prompting
Nannan Huang, Haytham M. Fayek, Xiuzhen Zhang
Improving Aspect-Based Summarization via Contrastive Learning with Anchored Negative Examples
Elizabeth Palmieri, Yangfeng Ji
Beyond Paraphrasing: Analyzing Summarization Abstractiveness and Reasoning
Nathan Zeweniuk, Ori Ernst, Jackie CK Cheung
CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models
Sathya Krishnan Suresh, Tanmay Surana, Lim Zhi Hao, Eng Siong Chng
Hierarchical Attention Adapter for Abstractive Dialogue Summarization
Raymond Li, Chuyuan Li, Gabriel Murray, Giuseppe Carenini
LLM-as-a-Judge Failures at Automating the Identification of Poor Quality Outputs in Free-Form Texts
Zongxia Li, Xiyang Wu, Ishani Mondal, Alexa Siu, Jordan Lee Boyd-Graber, Ani Nenkova
QA-prompting: Improving Summarization with Large Language Models using Question-Answering
Neelabh Sinha
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
Delip Rao, Weiqiu You, Eric Wong, Chris Callison-Burch

Organizers

Yue Dong
University of California, Riverside, USA

Wen Xiao
Microsoft Azure AI, Canada

Haopeng Zhang
University of Hawaii at Manoa, USA

Rui Zhang
Penn State University, USA

Ori Ernst
McGill University & Mila, Canada

Lu Wang
University of Michigan, USA

Fei Liu
Emory University, USA

Program Committee

Shmuel Amar (Bar-Ilan University)
Florian Boudin (JFLI, Nantes Université)
Avi Caciularu (Google)
Arie Cattan (Bar-Ilan University)
Hou Pong Chan (Alibaba DAMO Academy)
Khaoula Chehbouni (McGill University, Mila)
Ziling Cheng (McGill University & Mila)
Jackie Cheung (Mila / McGill)
Maxime Darrin (Mistral AI)
Felice Dell'Orletta (Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC))
Ron Eliav (Bar-Ilan University)
Tobias Falke (Amazon AGI)
Lorenzo Flores (MILA Quebec)
Yu Fu (University of California, Riverside)
Eran Hirsch (Bar-Ilan University)
Zhe Hu (The Hong Kong Polytechnic University)
Xinyu Hua (Bloomberg)
Patrick Huber (Meta)
Hayate Iso (Megagon Labs)
Ayal Klein (Bar Ilan University)
Wojciech Kryscinski (Cohere)
Elena Lloret (University of Alicante)
Margot Mieskes (University of Applied Sciences, Darmstadt)
Manabu Okumura (Tokyo Institute of Technology)
Jessica Ouyang (UT Dallas)
G M Shahariar (University of California, Riverside)
Haz Sameen Shahgir (University of California Riverside)
Ori Shapira (OriginAI)
Aviv Slobodkin (Bar Ilan University )
Cesare Spinoso (McGill )
Esaú Villatoro Tello (Idiap Research Institute, CH)
David Wan (UNC Chapel Hill)
Haohan Yuan (ALOHA Lab, University of Hawaii at Manoa)
Yusen Zhang (Penn State University )
Nan Zhang (The Pennsylvania State University)
Shiyue Zhang (Bloomberg)
Ming Zhong (UIUC)
Xiyuan Zou (McGill / MILA)