Invited Speakers

  • Angela Fan, Facebook AI Research and LORIA, France
  • Wilker Aziz, ILLC, University of Amsterdam, Netherlands
  • Sebastian Riedel, Meta AI and University College London, United Kingdom Nicola De Cao, Institute for Logic, Language and Computation, University of Amsterdam
  • Siva Reddy, Facebook CIFAR AI and Mila, McGill University, Canada
  • Albert Gu, Stanford University

Schedule: May 27th, 2022 [Timezone: GMT]

Time Event
9:00 AM Opening Remarks
9:10 AM Invited Talk
Can we learn more explicit relationships between languages in multilingual machine translation?
Angela Fan (Facebook AI Research, France)
Abstract: Exciting progress in improving natural language understanding and generation of English language text has naturally raised questions about how these improvements could extend to other languages. Some languages, such as French or Chinese, might contain sufficient text on the web to apply English-focused techniques. However, most languages of the world fall into a mid to low resource categorization, leading to multilingual approaches. Multilingual models are capable of modeling several languages at once, enabling related languages to learn from each other. In this talk, we discuss several approaches to this problem and how we might explicitly model relationships between languages in a more structured fashion.
Bio: Angela is a research scientist at Meta AI in New York, currently focusing on low-resource machine translation. Previously, Angela did her PhD at LORIA in Nancy, France on text generation, advised by Claire Gardent, Chloe Braud, and Antoine Bordes. Before that, Angela was a research engineer at Meta AI.
9:50 AM Invited Talk
Decoding is deciding under uncertainty — the case of NMT.
Wilker Aziz (ILLC, University of Amsterdam, Netherlands)
Abstract: In neural machine translation (NMT), we search for the mode of the model distribution to form predictions. We do so mostly following the intuition that the most probable outcome ought to be an important summary of the distribution. Despite our intuition, there’s plenty of evidence against the adequacy of the most probable translations in NMT. In this talk, I make a case to move away from mode-seeking search as a tool for decision making as well as for model criticism. I will highlight reasons concerning MT as a task, NMT as a probabilistic model, and MLE as training algorithm. Finally, I’ll turn to statistical decision theory and motivate a different rule for making decisions, one which is familiar to statistical MT folks like those of my generation and earlier, as well as a modern approximation of it. I’ll close the talk with a discussion of merits and limitations of this decision rule, and comments on opportunities moving forward with or without mode-seeking search.
Bio: Wilker Aziz is an assistant professor (UD) in natural language processing at the Institute for Logic, Language and Computation where he leads the Probabilistic Language Learning group. His work concerns the design of models and algorithms that learn to represent, understand, and generate language data. Examples of specific problems he is interested in include language modelling, machine translation, syntactic parsing, textual entailment, text classification, and question answering. He also develop techniques to approach general machine learning problems such as probabilistic inference, gradient and density estimation. His interests sit at the intersection of disciplines such as statistis, machine learning, approximate inference, global optimisation, formal languages, and computational linguistics.
10:30 AM Break
11:00 PM Contributed Talk
Neural String Edit Distance
Jindřich Libovický and Alexander Fraser
Abstract: We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance. We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function, allowing us to integrate it into a neural network providing a contextual representation of the input. We evaluate on cognate detection, transliteration, and grapheme-to-phoneme conversion, and show that we can trade off between performance and interpretability in a single framework. Using contextual representations, which are difficult to interpret, we match the performance of state-of-the-art string-pair matching models. Using static embeddings and a slightly different loss function, we force interpretability, at the expense of an accuracy drop.
11:15 PM Contributed Talk
A Joint Learning Approach for Semi-supervised Neural Topic Modeling
Jeffrey Chiu, Rajat Mittal, Neehal Tumma, Abhishek Sharma and Finale Doshi-Velez
Abstract: Topic models are some of the most popular ways to represent textual data in an interpretable manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.
11:30 PM Contributed Talk
Predicting Attention Sparsity in Transformers
Marcos Vinicius Treviso, António Góis, Patrick Fernandes, Erick Rocha Fonseca, and Andre Martins
Abstract: Transformers’ quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax. An alternative path, used by entmax transformers, consists of having built-in exact sparse attention; however this approach still requires quadratic computation. In this paper, we propose Sparsefinder, a simple model trained to identify the sparsity pattern of entmax attention before computing it. We experiment with three variants of our method, based on distances, quantization, and clustering, on two tasks: machine translation (attention in the decoder) and masked language modeling (encoder-only). Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph. This allows for detailed comparison between different models along their Pareto curves, important to guide future benchmarks for sparse attention models.
11:45 AM Online Poster Session
Included papers:
Joint Entity and Relation Extraction Based on Table Labeling Using Convolutional Neural Networks.
Youmi Ma, Tatsuya Hiraoka and Naoaki Okazaki

Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion
Shunsuke Kando, Hiroshi Noji and Yusuke Miyao

DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning.
Hossein Rajaby Faghihi, Quan Guo, Andrzej Uszok, Aliakbar Nafar and Parisa Kordjamshidi

Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors.
Wanyu Du, Jianqiao Zhao, Liwei Wang and Yangfeng Ji

Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding.
Sijia Wang, Mo Yu, Shiyu Chang, Lichao Sun and Lifu Huang

Extracting Temporal Event Relation with Syntax-guided Graph Transformer.
Shuaicheng Zhang, Qiang Ning and Lifu Huang
12:30 PM Lunch break
14:00 PM Invited Talk
Autoregressive Retrieval
Sebastian Riedel (Meta AI and University College London, United Kingdom)
Nicola De Cao (Institute for Logic, Language and Computation, University of Amsterdam)
Abstract: Today many NLP problems are addressed by fine-tuning auto-regressive language models to generate specific natural language outputs. Among various advantages, this approach aligns pre-training loss with tasks objectives and simplifies engineering—the same tools, algorithms and input/output signature can used to tackle a wide range of tasks. A notable exception are retrieval problems where elements from a large set of candidates are selected to match a natural language query. Passage retrieval is an obvious instance, but tasks such as entity linking or topic tagging can also be seen is this light. In this talk I will show how auto-regressive language models can be deployed for such tasks too, by 1) mapping retrieval elements to natural strings that identify them and 2) using constrained decoding and various index structures to generate such strings. We show that this leads to strong results, compact models and the ability to generate “new elements” on the fly when needed.
Bio: Nicola De Cao is a 4th-year PhD student at the University of Amsterdam and Edinburgh and currently a part-time researcher at Huggingface. Nicola will also soon join Google Research in London. His work mainly focuses on natural language understanding and, more specifically, on question answering, retrieval and entity linking. He also developed more general machine learning techniques like autoregressive retrieval and normalizing flows, and worked on non-euclidean probabilistic models.
14:45 AM In-person Poster Session
Included papers:
A Joint Learning Approach for Semi-supervised Neural Topic Modeling.
Jeffrey Chiu, Rajat Mittal, Neehal Tumma, Abhishek Sharma and Finale Doshi-Velez

SlotGAN: Detecting Mentions in Text via Adversarial Distant Learning
Daniel Daza, Michael Cochez and Paul Groth

TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion.
Guirong Fu, Zhao Meng, Zhen Han, Zifeng Ding, Yunpu Ma, Matthias Schubert, Volker Tresp and Roger Wattenhofer

Predicting Attention Sparsity in Transformers
Marcos Vinicius Treviso, António Góis, Patrick Fernandes, Erick Rocha Fonseca, and Andre Martins

Neural String Edit Distance
Jindřich Libovický and Alexander Fraser

Conditioning Pretrained Language Models with Multi-Modal information on Data-to-Text Generation.
Qianqian Qi, Zhenyun Deng, Yonghua Zhu, Lia Lee, Jiamou Liu and Michael J. Witbrock

Language Modelling via Learning to Rank.
Arvid Frydenlund, Gagandeep Singh and Frank Rudzicz
15:00 AM Break
15:30 PM Invited Talk
Do we still need inductive biases after Transformer language models?
Siva Reddy (Facebook CIFAR AI and Mila, McGill University, Canada)
Abstract: In this talk, I will explore the role of inductive biases when fine-tuning large Transformer language models in three different scenarios: when output space is structured, for example, semantic parsing from language to code; when performing multi-task learning where tasks may share some latent structure, e.g., different semantic tasks like question answering and text entailment may share common reasoning skills; when the input involves a higher-order (latent) structure such as negation. It is not always the case that inductive biases help. Come with your wisest/wildest answers.
Bio: Siva Reddy is an Assistant Professor in the School of Computer Science and Linguistics at McGill University. He is a Facebook CIFAR AI Chair and a core faculty member of Mila Quebec AI Institute. Before McGill, he was a postdoctoral researcher at Stanford University. He received his PhD from the University of Edinburgh in 2017, where he was a Google PhD Fellow. His research focuses on representation learning for language that facilitates systematic generalization, reasoning and conversational modeling. He received the 2020 VentureBeat AI Innovation Award in NLP, and the best paper award at EMNLP 2021.
16:15 PM Invited Talk
Efficiently Modeling Long Sequences with Structured State Spaces.
Albert Gu (Stanford University)
Abstract: A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of 10000 or more steps. This talk introduces the Structured State Space sequence model (S4), a simple new model based on the fundamental state space representation x’(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t). S4 combines elegant properties of state space models with the recent HiPPO theory of continuous-time memorization, resulting in a class of structured models that handles long-range dependencies mathematically and can be computed very efficiently. S4 achieves strong empirical results across a diverse range of established benchmarks, particularly for (i) continuous signal data such as images, audio, and time series, and (ii) very long sequences, establishing state-of-the-art by over 20 points on the Long Range Arena benchmark.
Bio: Albert Gu is a final year Ph.D. candidate in the Department of Computer Science at Stanford University, advised by Christopher Ré. His research broadly studies structured representations for advancing the capabilities of machine learning and deep learning models, with focuses on structured linear algebra, non-Euclidean representations, and theory of sequence models. Previously, he completed a B.S. in Mathematics and Computer Science at Carnegie Mellon University.
17:00 PM Closing Remarks