April 10th, 2022 - Stavanger, Norway

Text2Story 2022

Fifth International Workshop on Narrative Extraction from Texts
held in conjunction with the 44th European Conference on Information Retrieval

Call for papers

Call for papers

Although information retrieval and natural language processing have made significant progress towards an automatic interpretation of texts, the problem of constructing consistent narrative structures is yet to be solved. In the fifth edition of the Text2Story workshop, we aim to foster the discussion of recent advances in the link between Information Retrieval (IR) and formal narrative understanding and representation of texts. Specifically, we aim to provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction task.

  • Narrative Representation Language
  • Story Evolution and Shift Detection
  • Temporal Relation Identification
  • Temporal Reasoning and Ordering of Events
  • Causal Relation Extraction and Arrangement
  • Narrative Summarization
  • Multi-modal Summarization
  • Automatic Timeline Generation
  • Storyline Visualization
  • Comprehension of Generated Narratives and Timelines
  • Big Data Applied to Narrative Extraction
  • Personalization and Recommendation of Narratives
  • User Profiling and User Behavior Modeling
  • Sentiment and Opinion Detection in Texts
  • Argumentation Analysis
  • Bias Detection and Removal in Generated Stories
  • Ethical and Fair Narrative Generation
  • Misinformation and Fact Checking
  • Bots Influence
  • Information Retrieval Models based on Story Evolution
  • Narrative-focused Search in Text Collections
  • Event and Entity importance Estimation in Narratives
  • Multilinguality: Multilingual and Cross-lingual Narrative Analysis
  • Evaluation Methodologies for Narrative Extraction
  • Resources and Dataset Showcase
  • Dataset Annotation for Narrative Generation/Analysis
  • Applications in Social Media (e.g. narrative generation during a natural disaster)

tls-covid19 Dataset

We challenge the interested researchers to consider submitting a paper that makes use of the tls-covid19 dataset - published at ECIR'21 - under the scope and purposes of the text2story workshop. tls-covid19 consists of a number of curated topics related to the Covid-19 outbreak, with associated news articles from Portuguese and English news outlets and their respective reference timelines as gold-standard. While it was designed to support timeline summarization research tasks it can also be used for other tasks including the study of news coverage about the COVID-19 pandemic.

Important Dates


We invite five kinds of submissions:

  • Research papers (max 7 pages + references)
  • Demos and position papers (max 5 pages + references)
  • Work in progress and project description papers (max 4 pages + references)
  • Nectar papers with a summary of own work published in other conferences or journals that is worthwhile sharing with the Text2Story community, by emphasizing how it can be applied for narrative extraction, processing or storytelling, adding some more insights or discussions; novel aspects, results or case studies (max 3 pages + references)
  • Negative result papers to highlight tested hypotheses that did not get the expected outcome (max 7 pages + references)

Papers must be submitted electronically in PDF format through Easy Chair . All submissions must be in English and formatted according to the one-column CEUR-ART style with no page numbers. Templates, either in Word or LaTeX, can be found in the following zip folder . There is also an Overleaf page for LaTeX users.

Submissions will be peer-reviewed by at least two members of the programme committee. The accepted papers will appear in the proceedings published at CEUR workshop proceedings (usually indexed on DBLP).

Workshop Format

Participants of accepted papers will be given 15 minutes for oral presentations.


Organizing Committee

Program Committee

  • Álvaro Figueira (INESC TEC & University of Porto)
  • Andreas Spitz (University of Konstanz)
  • António Horta Branco (University of Lisbon)
  • Arian Pasquali (CitizenLab)
  • Brenda Santana (Federal University of Rio Grande do Sul)
  • Bruno Martins (IST and INESC-ID - Instituto Superior Técnico, University of Lisbon)
  • Demian Gholipour (University College Dublin)
  • Daniel Gomes (FCT/Arquivo.pt)
  • Daniel Loureiro (University of Porto)
  • Deya Banisakher (Defense Threat Reduction Agency (DTRA), Ft. Belvior, VA, USA.)
  • Dhruv Gupta (Norwegian University of Science and Technology (NTNU), Trondheim, Norway)
  • Dwaipayan Roy (ISI Kolkata, India)
  • Dyaa Albakour (Signal)
  • Evelin Amorim (INESC TEC)
  • Florian Boudin (Université de Nantes)
  • Grigorios Tsoumakas (Aristotle University of Thessaloniki)
  • Henrique Lopes Cardoso (University of Porto)
  • Hugo Sousa (INESC TEC)
  • Ismail Sengor Altingovde (Middle East Technical University)
  • Jeffery Ansah (BHP)
  • João Paulo Cordeiro (INESC TEC & University of Beira Interior)
  • Kiran Kumar Bandeli (Walmart Inc.)
  • Ludovic Moncla (INSA Lyon)
  • Marc Spaniol (Université de Caen Normandie)
  • Mark Finlayson (Florida International University)
  • Nina Tahmasebi (University of Gothenburg)
  • Pablo Gamallo (University of Santiago de Compostela)
  • Paulo Quaresma (Universidade de Évora)
  • Pablo Gervás (Universidad Complutense de Madrid)
  • Paul Rayson (Lancaster University)
  • Satya Almasian (Heidelberg University)
  • Sérgio Nunes (INESC TEC & University of Porto)
  • Udo Kruschwitz (University of Regensburg)
  • Yihong Zhang (Kyoto University)

Proceedings Chair

  • João Paulo Cordeiro (INESC TEC; Universidade da Beira do Interior, Covilhã, Portugal)
  • Conceição Rocha (INESC TEC)

Web and Dissemination Chair

  • Hugo Sousa (INESC TEC)
  • Behrooz Mansouri (Rochester Institute of Technology)

Invited Speakers

We Have the Best Words: From the Web-scale Extraction and Attribution of Quotes to Analyzing Negativity in U.S. Political Language

Speaker: Andreas Spitz, University of Konstanz

Abstract: A substantial majority of Americans share the belief that the political discourse in the U.S. has recently become more negative, and more than half of them blame this change on Donald Trump. However, as is often the case in politics, talk is cheap and hard data is difficult to come by. To provide quantitative answers (and distribute blame deservedly) we consider the large-scale extraction and attribution of quotes by politicians for the analysis of political discourse. In the first part of this talk, I introduce Quobert, a transformer-based model that exploits the parallelism in news reporting for the extraction and attribution of quotes from news. Using Quotebank, a comprehensive corpus of 235 million unique quotations that we extracted with Quobert from a decade of news, I then demonstrate how this data can be used to quantify trends in the use of political language. In particular, I will focus on the uptick in negativity in U.S. politicians' language after the end of Obama's tenure, quantify the shifts in language tone, and unravel to whom these shifts could feasibly be attributed.

Bio: Andreas Spitz is an assistant professor and head of the Data and Information Mining lab at the University of Konstanz. He holds a PhD in computer science from Heidelberg University and visited the EPFL Data Science lab as a postdoctoral researcher. Andreas' research interests lie at the intersection of information retrieval, natural language processing, computational social science, and complex network analysis. He is particularly interested in graph representations of natural language and how they can be used to efficiently query, visualize, and explore large corpora.

Robust and multilingual analysis of historical documents

Speaker: Antoine Doucet, University of La Rochelle

Abstract: Many documents can only be accessed through digitization. This is notably the case of historical and handwritten documents, but also that of many digitally-born documents, turned into images for various reasons (e.g., a file conversion or the intermediary use of an analog form in order to manually sign a document, fill out a form, send by post, etc.). Being able to analyze the textual content of such digitized documents requires a phase of conversion from the captured image to a textual representation, key parts of which are optical character recognition (OCR) and layout analysis. The resulting text and structure are often imperfect, to an extent which is notably correlated with the quality of the initial medium (which may be stained, folded, aged, etc.) and with the quality of the image taken from it. In this talk, I will present recent advances in AI and natural language understanding that enable this type of corpus to be analyzed in a way that is robust to digitization. For example, I will show how we were able, in the H2020 NewsEye project to create state-of-the-art results for the cross-lingual recognition and disambiguation of named entities (names of people, places, and organizations) in large corpora of historical newspapers written in 4 languages, written between 1850 and 1950. This type of result paves the way to a large-scale analysis of digitized documents, notably able to cross linguistic borders.

Bio: Antoine Doucet is a tenured full Professor in computer science at the L3i laboratory of the University of La Rochelle since 2014. Leader of research group in document analysis, digital contents and images in La Rochelle Université (about 50 people), he also directs the ICT department of the Vietnamese-French University of Science and Technology of Hanoi (USTH). He was until January 2022 the coordinator of the H2020 project NewsEye, focusing on augmenting access to historical newspapers, across domains and languages. He further leads the effort on semantic enrichment for low-resourced languages in the context of the H2020 project Embeddia. His main research interests lie in the fields of information retrieval, natural language processing, (text) data mining and artificial intelligence. The central focus of his work is on the development of methods that scale to very large document collections and that use as few external resources as possible, in order to be particularly applicable to documents of any type written in any language, from news articles to social networks, and from digitized manuscripts to digitally-born documents. Antoine Doucet holds a PhD in computer science from the University in Helsinki (Finland) since 2005, and a French research supervision habilitation (HDR) since 2012.


Displaying agenda in event timezone (Norway local time).

09h30 – 09h40 Introduction
(Ricardo Campos)
in-person | slides

Session Chair: Ricardo Campos
09h40 – 10h20 Keynote 1: Robust and multilingual analysis of historical documents
(Antoine Doucet, University of La Rochelle)
in-person | slides
10h20 – 10h40 Time for some German? Pre-Training a Transformer-based Temporal Tagger for German
(Satya Almasian, Dennis Aumiller and Michael Gertz)
in-person | video | slides
10h40 – 11h00 Understanding COVID-19 News Coverage using Medical NLP
(Ali Emre, Veysel Kocaman, Hasham Ul Haq and David Talby)
in-person | slides

11h00 – 11h30 Coffee Break

Session Chair: Sumit Bhatia
11h30 – 11h50 Changing the Narrative Perspective: From Ranking to Prompt-Based Generation of Entity Mentions
(Mike Chen and Razvan Bunescu)
online | slides
11h50 – 12h10 EnDSUM: Entropy and Diversity based Disaster Tweet Summarization
(Piyush Kumar Garg, Roshni Chakraborty and Sourav Kumar Dandapat)
online | video | slides
Session Chair: Marina Litvak
12h10 - 12h30 Simplifying News Clustering Through Projection From a Shared Multilingual Space
(João Santos, Afonso Mendes and Sebastião Miranda)
in-person | video | slides
12h30 - 12h50 Exploring Data Augmentation for Classification of Climate Change Denial: Preliminary Study
(Jakub Piskorski, Nikolaos Nikolaidis, Nicolas Stefanovitch, Bonka Kotseva, Irene Vianini, Sopho Kharazi and Jens Linge)
online | slides
12h50 - 13h10 Dynamic change detection in topics based on rolling LDAs
(Jonas Rieger, Kai-Robin Lange, Jonathan Flossdorf and Carsten Jentsch)
in-person | video | slides

13h10 – 14h00 Lunch Break

Session Chair: Adam Jatowt
14h00 – 14h50 Keynote 2: We Have the Best Words: From the Web-scale Extraction and Attribution of Quotes to Analyzing Negativity in U.S. Political Language
(Andreas Spitz, University of Konstanz)
in-person | slides
14h50 - 15h10 Text2Icons: representing narratives with icon strips
(Joana Valente, Alípio Jorge and Sérgio Nunes)
in-person | slides
15h10 - 15h30 Comprehensive contextual visualization of a news archive
(Ishrat Sami, Tony Russell-Rose and Larisa Soldatova)
online | video | slides

15h30 – 16h00 Coffee Break

Session Chair: Alípio Jorge
16h00 - 16h20 Causality Mining in Fiction
(Margaret Meehan, Andrew Piper and Dane Malenfant)
online | video | slides
16h20 - 16h40 Extracting Impact Model Narratives from Social Services’ Text
(Bart Gajderowicz and Mark Fox)
online | slides
Session Chair: Vasco Campos
16h40 - 17h00 MARCUS: An Event-Centric NLP Pipeline that generates Character Arcs from Narratives
(Sriharsh Bhyravajjula, Ujwal Narayan and Manish Shrivastava)
in-person | video | slides

17h00 – 17h30 Best Paper Award and Reviewers Award
(Ricardo Campos, Alípio Jorge, Adam Jatowt, Marina Litvak)


Text2Story 2022 will be held at the 44th European Conference on Information Retrieval (ECIR'22) in Stavanger, Norway

Registration at ECIR 2022 is required to attend the workshop (don't forget to select the Text2Story workshop).


This project is financed by the ERDF – European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185)