NewSumm EMNLP Workshop 2023

The 4th New Frontiers in Summarization (with LLMs) Workshop

EMNLP 2023

The Fourth Workshop on “New Frontiers in Summarization” aims to promote the cross-fertilization of ideas in automatic summarization and related fields. This includes discussion on novel paradigms, shared tasks of interest, applied research and applications, and possible future research directions. In addition to building a cohesive research community, the workshop will accelerate knowledge diffusion by developing new tools, datasets, and resources that are in line with the summarization needs of academia, industry, and government.

New advances in natural language processing (e.g., pre-trained models and prompt-based learning) have resulted in state-of-the-art performance according to existing standards of summarization evaluation. A number of new challenges have emerged, and moving forward with large-scale models we don’t fully understand calls for caution. Challenges are posed from multiple directions, including but not limited to the trustworthiness of the generation, the interpretability and controllability of the models, the reliability of evaluation, and the integration of additional sources like knowledge and other modality. Considering these challenges will be crucial for realistic, ecologically valid deployment of summarization research.

Schedule

Wednesday, December 6, 2023

Live Record of all Oral Sessions

We are sorry that only part of Chenguang’s talk has been recorded due to technical problems.

Time	Session Chair	Event & Details
08:50 - 09:00	Wen Xiao	Opening Remarks Slides
09:00 - 09:45	Lei Yu	Keynote I - Kathleen McKeown (Columbia University) Addressing Large Language Models that Lie: Case Studies in Summarization
09:45 - 10:30	Lei Yu	Keynote II - Jackie Cheung (McGill University) Open Problems in Automatic Summarization
10:30 - 11:00	-	Coffee Break
11:00 - 11:45	Wen Xiao	Keynote III - Rui Zhang (Penn State University) Are Large Language Models Fair Summarizers?
11:45 - 12:30	Wen Xiao	Keynote IV - Iz Beltagy (Allen Institute for AI) The Quest for Open Language Models
12:30 - 14:00	-	Lunch Break
14:00 - 14:45	Wen Xiao	Keynote V - Chenguang Zhu (Zoom) Facing the Challenges and Opportunities of LLMs
14:45 - 15:30	Wen Xiao	Lightning Talks (Workshop papers and Findings papers, Slides)
15:30 - 16:00	-	Coffee Break
16:00 - 17:30	Wen Xiao	Poster Session (In-person/Virtual: Gathertown) (Workshop papers and Findings papers)

Keynote Spearkers

We are deeply sad that our invited speaker - Dragomir Radev had passed away in 2023. It’s a profound loss that we cannot have him grace our stage. However, we are grateful that Drago’s graduated PhD student, Rui Zhang, who is now an assistant professor at PSU, kindly accepted our invitation to give a keynote talk on summarization in memory of Drago.

Kathleen McKeown

Columbia University

Addressing Large Language Models that Lie: Case Studies in Summarization

The advent of large language models promises a new level of performance in the generation of text of all kinds, enabling the generation of text that is far more fluent, coherent, and relevant than was previously possible. However, they also introduce a major new problem: they hallucinate facts out of thin air. When summarizing an input document, they may incorrectly intermingle facts from the input, they may introduce facts that were not mentioned at all, and worse yet, they may even make up things that are not true in the real world. In this talk, I will discuss our work in characterizing the kinds of errors that can occur and methods that we have developed to help mitigate hallucination in language modeling approaches to text summarization for a variety of genres.

Jackie Cheung

McGill University

Open Problems in Automatic Summarization

Pre-trained language models have met and exceeded human-level performance on summarization benchmarks, often with the help of adaptation towards task-specific or human-elicited rewards. What does this mean for the field of automatic summarization? I argue that these results represent a milestone to be celebrated, but that they barely scratch the surface of the work ahead for summarization researchers. I discuss challenges that still remain largely unsolved and under-researched: how do we develop summarization systems that can perform more complex reasoning? How do we use this reasoning capability to aggregate and analyze information across vast amounts of text? What responsible AI issues matter in the deployment of summarization systems? And how do we evaluate for all of these desiderata? Progress on these open questions gives us the exciting prospect of summarization systems that are useful and beneficial in practice.

Rui Zhang

Penn State University

Are Large Language Models Fair Summarizers?

We live in a world of value pluralism where several values can be equally correct and fundamental, and yet in conflict with each other. Traditional summarization models are optimized for the prestige and centrality of important words and concepts but not for diversity and novelty, and thus when presented with diverse perspectives and conflicting opinions, they can display bias by ignoring certain parts of inputs. In this talk, I will present our initial efforts in investigating fair abstractive summarization by large language models that aims to generate a fair summary for user-generated data by providing an accurate and comprehensive view of various perspectives from different groups.

Iz Beltagy

Allen Institute for AI

The Quest for Open Language Models

With the rapid progress of proprietary language models, the open research community is spending more effort to advance the state-of-the-art of open models. For that, AI2 started a project called OLMo, aiming to build and release an entirely in-house, truly open LLM. In this talk, I will tell you more about what AI2 is building for OLMo. I will discuss the DOLMA dataset and its toolkit. I will discuss our pretraining and modeling experience. Finally, I will talk about our efforts in language model adaptation, focusing on instruction tuning and RLHF efforts to train and evaluate the TULU models. With the rapid advancement of proprietary language models, the open research community is increasingly focused on developing state-of-the-art open models. AI2 has initiated a project named OLMo, with the goal of creating and releasing a completely in-house, fully open LLM. In this presentation, I will delve into AI2's work on OLMo. I'll explore the DOLMA dataset and its associated toolkit, share insights from our pretraining and modeling experiences, and conclude by discussing our language model adaptation efforts. This will focus on instruction tuning and RLHF efforts used to train and evaluate the TULU models.

Chenguang Zhu

Zoom

Facing the Challenges and Opportunities of LLMs

Recent progress of LLMs has dramatically changed the research and business communities, and posed challenges to many NLP research areas such as summarization. However, the ultimate goal of any model is to perfectly solve specialized domain problems users care about, which is not achieved by existing LLMs as a generalist. In this talk, I will introduce how to face these challenges and adapt NLP research in the new LLM era. This includes: 1) Adapt and Improve, i.e., integrate traditional NLP and ML techniques with LLM to achieve better performance in various domains; 2) Leverage and Empower, i.e., stand on the shoulder of LLM to tackle complex problems. These measures can effectively guide a generalist LLM towards a specialist in domains of interest.

Call for Papers

Both long paper (up to 8 pages with unlimited reference) and short paper (up to 4 pages with unlimited reference) are welcomed for submission!

A list of topics relevant to this workshop (but not limited to):

Abstractive summarization, extractive summarization and their integration
Summarization with pre-trained large models
Zero-shot/few-shot summarization
Fairness in summarization: faithfulness, bias, toxicity, and privacy-preserving
Interpretability and visualization of summarization systems
Controlled and tailored text generation
Knowledge/common sense injected summarization
Multiple text genres (News, tweets, product reviews, conversations, medical records, books, research articles, etc.)
Multimodal learning: information integration and aggregation across multiple modalities (text, speech, image, video)
Multilingual summarization
Semantic aspects of summarization (e.g., semantic representation, inference, validity)
Cognitive or psycholinguistic aspects of summarization (e.g., perceived readability, usability, etc.)
Development of novel algorithms (e.g., integrating neural and non-neural, distant supervision)
Development of new datasets and annotations
Development of new evaluation metrics

Submission Instructions

You are invited to submit your papers in our START/SoftConf submission portal. All the submitted papers have to be anonymous for double-blind review. The content of the paper should not be longer than 8 pages for long papers and 4 pages for short papers, strictly following the ACL 2023 style templates, with the mandatory limitation section not counting towards the page limit. Supplementary and appendices (either as separate files or appended after the main submission) are allowed. We encourage code link submissions for the camera-ready version.

Dual Submission

NewSumm 2023 will allow double submission as long as the authors make a decision before camera-ready. We will not consider any paper that overlaps significantly in content or results with papers that will be (or have been) published elsewhere. Authors submitting more than one paper to NewSumm 2023 must ensure that their submissions do not overlap significantly (>25%) with each other in content or results. Authors can submit up to 100 MB of supplementary materials separately. Authors are highly encouraged to submit their codes for reproducibility purposes.

Fast-Track Submission

If your paper has been reviewed by ACL, EMNLP, EACL, or ARR and the average rating is higher than 2.5 (either avg soundness or excitement score), the paper is qualified to be submitted to the fast-track. In the appendix, please include the reviews and a short statement discussing what parts of the paper have been revised.

ACL Rolling Review (ARR) Submissions: Our workshop also welcomes submissions from ARR. Authors of any papers that are submitted to ARR and have their meta review ready may submit their papers and reviews for consideration for the workshop until 10 October 2023. This should include submissions to ARR for the 15 August deadline. The decision of publication will be announced by 17 October 2023. The commitment should be done via the workshop submission website: START/SoftConf submission portal (“ACL Rolling Review Commitment” submission type)

Non-archival Option

ACL workshops are traditionally archival. To allow dual submission of work, we are also including a non-archival track. Authors have the flexibility to submit their unpublished research in a non-archival format, where only the abstract will be included in the conference proceedings. These non-archival submissions are expected to meet the same quality criteria as their archival counterparts and will undergo an identical review process. This option is designed to facilitate future publication opportunities in journals or conferences that disallow previously archived material. It also aims to foster engagement and constructive feedback on well-developed but yet-to-be-published work. Like archival submissions, non-archival entries must conform to the established formatting and length guidelines.

Important Dates:

Sep.8, 2023: Workshop Submission Due Date (extended from Sep. 1st)
Oct. 10, 2023: Fast-Track Submission and ARR Commitment Deadline
Oct. 17, 2023: Notification of Acceptance (Direct, ARR, and Fast-Track Notification)
Oct. 24, 2023: Camera-ready Papers Due
Dec. 6: Workshop Date

Organizers

Yue Dong
University of California, Riverside, USA

Wen Xiao
Microsoft Azure AI, Canada

Wang Lu
University of Michigan, USA

Fei Liu
Emory University, USA

Giuseppe Carenini
University of British Columbia, Canada

Program Committee

Manabu Okumura (Tokyo Institute of Technology)
Ido Dagan (Bar-Ilan University)
Ming Zhong (UIUC)
Kristjan Arumae (Qualtrics)
Pengcheng He (Microsoft Research)
Naoaki Okazaki (Tokyo Institute of Technology)
Zhe Hu (Baidu Inc)
Wojciech Kryscinski (Salesforce Research)
Haopeng Zhang (University of California Davis)
Hou Pong Chan (University of Macau)
Yang Liu (Microsoft)
Kaiqiang Song (Tencent AI Lab)
Juan-Manuel Torres-Moreno (LIA Avignon Université)
Jing Jiang (Singapore Management University)
Ziqiang Cao (Soochow University)
Margot Mieskes (University of Applied Sciences, Darmstadt)
Felice Dell'Orletta (Istituto di Linguistica Computazionale «A. Zampolli», CNR, Pisa, Italy)
Xinnuo Xu (University of Edinburgh)
Richard Evans (University of Wolverhampton)
Esau Villatoro-Tello (Idiap Research Institute)
Susana Bautista (Universidad Francisco de Vitoria)
Tobias Falke (Amazon Alexa)
Kellie Webster (Google)
Giulia Venturi (Institute for Computational Linguistics "A. Zampolli" (ILC-CNR)
Jessica Ouyang (University of Texas at Dallas)
Wencan Luo (Google)
Rui Zhang (Penn State University)
Linzi Xing (University of British Columbia)
Jiacheng Xu (Salesforce AI Research)
Tadashi Nomoto (National Institute of Japanese Literature)
Chao Zhao (UNC Chapel Hill)
Ori Shapira (Amazon)
Patrick Huber (UBC)
Florian Boudin (Nantes Université)
Xinyu Hua (Bloomberg)
Elena Lloret (University of Alicante, Spain)
Alexander Fabbri (Salesforce AI Research)
Tanya Goyal (UT Austin)
Yuntian Deng (Harvard University)
Maxime Peyrard (EPFL)
Arpit Sood (Meta)
Niyathi Allu (University of California, Riverside)
Priyanshu Sharma (University of California, Riverside)