Listen to this story
After CVPR, the 61st annual Association for Computational Linguistics (ACL) conference is under way from July 9 to 14 in Toronto, Canada. As a Diamond Level sponsor, Google is set to present over 50 publications and actively contribute to workshops and tutorials. The godfather of neural networks, Geoffrey Hinton, who was the keynote speaker at the conference, highlighted the subjective experience vs sentience of larger language models. The event covered topics such as computational social science and cultural analytics, dialogue, interactive systems, and discourse and pragmatics.
Let’s take a look at the top papers presented by Google at the conference.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
The paper introduces NusaCrowd, which aims to gather and consolidate existing resources for Indonesian languages, including previously inaccessible ones. By combining 137 datasets and 118 standardised data loaders, the project provides valuable resources for natural language understanding and generation. Through manual and automated evaluations, the datasets have been verified for quality.
NusaCrowd facilitates the development of zero-shot benchmarks for Indonesian and local languages, as well as the first multilingual automatic speech recognition benchmark. This work aims to advance research in natural language processing for underrepresented but widely spoken languages.
A new approach called “contrastive loss with SAMe TOwer NEgatives” (SamToNe) for training dual encoders used in retrieval tasks and representation learning is unveiled in this paper. By including queries or documents from the same encoder towers as negatives, SamToNe improves retrieval quality in both symmetric and asymmetric dual encoders.
The effectiveness of SamToNe is demonstrated through evaluations on various benchmarks. Additionally, the method ensures alignment between the embedding spaces of the encoder towers, as observed through the t-SNE algorithm. The paper also provides insights into the efficacy of SamToNe in terms of regularisation based on the analysis of embedding distance distributions.
Google Research and Google DeepMind present RISE, a novel method for evaluating automatically-generated text summaries. RISE uses information retrieval techniques and is trained as a retrieval task using a dual-encoder setup. It can evaluate generated summaries without the need for gold reference summaries, making it suitable for new datasets.
Experimental results on benchmark datasets demonstrate that RISE consistently outperforms previous approaches in terms of correlation with human evaluations. Additionally, RISE exhibits data-efficiency and generalisability across languages.
The aim of researchers in this paper is to solve the age old the challenge of summarising a large number of reviews for a product or place. While supervised systems have been successful in news domains, they lack the availability of large-scale datasets for opinion texts. To bridge this gap, the paper proposes an unsupervised self-training approach called OPINESUM for abstractive opinion summarisation.
This approach utilises textual entailment to capture the consensus of opinions from multiple reviews and generate summaries. OPINESUM can generate silver-standard summaries at a large scale and achieve state-of-the-art performance in both unsupervised and few-shot settings.
In this paper, the focus is on controllability and robustness of LLMs. It is demonstrated that state-of-the-art models like T5 and PaLM may lack these qualities, especially as the model size increases. To address this, a new approach called knowledge aware finetuning (KAFT) is proposed, which improves controllability and robustness by incorporating counterfactual and irrelevant contexts during training. The effectiveness of KAFT is demonstrated through comprehensive evaluations across different model architectures and sizes.
This paper focuses on the lack of confidence reported in outcomes within the NLP leaderboard culture. The authors propose a framework and simulator to estimate p-values for comparing the performance of two systems, aiming to determine the confidence that one system is genuinely better than the other. They establish a null hypothesis assuming that both systems’ metric scores are drawn from the same distribution. By creating a test set that combines responses from both systems, they investigate different methods to accurately estimate the p-value considering factors like response variance, metric choice, and sampling method, emphasising their importance in providing reliable statistical guarantees for model comparisons.
The paper introduces a new method called ‘Distilling step-by-step’ to address the challenges of deploying large language models (LLMs). It trains smaller models that outperform LLMs by using LLM rationales as additional supervision within a multi-task framework. The method achieves better performance with fewer labeled/unlabelled training examples compared to finetuning and distillation. It also achieves better performance with smaller model sizes compared to few-shot prompted LLMs. Additionally, the method reduces the model size and the amount of data required to outperform LLMs, as demonstrated by the results on NLP benchmarks.
The paper proposes PROPSEGMENT, a collection of over 45,000 propositions annotated by experts. The dataset focuses on two tasks: segmenting sentences into propositions and classifying the entailment relationship between each proposition and another document on the same topic. The paper establishes effective starting points for these tasks and showcases the potential of PROPSEGMENT in detecting summary hallucination and understanding the compositionality of Natural Language Inference (NLI) labels at the document level.
Here we see TOUR, a new method for optimising query representations in dense retrieval. It leverages a crossencoder re-ranker to provide pseudo labels for retrieval results and iteratively improves query representations using gradient descent. TOUR is shown to enhance open-domain question answering accuracy, passage retrieval performance, and direct re-ranking speed.