katerina margatina

Email
Google Scholar
Semantic Scholar
GitHub
LinkedIn
Twitter

I’m currently an Applied Scientist at Amazon in NYC, working with the AWS Bedrock Agents team. My main focus is on improving how LLM agents work—making them more useful, reliable, and efficient.

I earned my PhD in Computer Science at the University Sheffield, under the supervision of Prof. Nikos Aletras. I researched active learning algorithms for data efficient LLMs. Along the way, I spent time as a Research Scientist intern at Meta AI (FAIR) in London where I explored the intersection of in-context and active learning methods for LLMs, and at AWS in NYC where I studied temporal robustness of LLMs. I also visited the CoAStaL group in the University of Copenhagen, where I worked on learning from disagreement and cross-cultural NLP.

Before my doctoral studies (i.e., what feels like a lifetime ago), I was a Machine Learning Engineer at DeepSea Technologies. In my undergrad, I studied Electrical & Computer Engineering at the National Technical University of Athens (NTUA).

news

Dec 13, 2024	🌈PRISM won best paper award at NeurIPS 2024 Datasets & Benchmarks track!!🚀🚀🚀
Dec 3, 2024	My PhD thesis Exploring Active Learning Algorithms for Data Efficient Language Models is finally online!
Jul 22, 2024	I just defended my PhD and got it with no corrections!!!🥰🎓
Apr 24, 2024	Super excited to share that our preprint The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models is on Arxiv!
Jan 16, 2024	Life update! I joined AWS Bedrock as an Applied Scientist working in LLM Agents.🤖

selected publications

Thesis

Exploring Active Learning Algorithms for Data Efficient Language Models

Katerina Margatina

2024

Abstract HTML

Supervised learning is based in the premise that models can effectively solve tasks by learning from numerous examples, mapping inputs to outputs through iterative learning. However, contemporary deep learning models often require vast amounts of labeled data, termed training examples, for optimal performance. Unfortunately, not all training examples contribute equally to the learning process, leading to inefficiencies and resource wastage. Active Learning (AL) has emerged as a powerful paradigm for training language models in a data-efficient manner. By iteratively selecting informative unlabeled data points, which are then annotated by humans to form the training set, AL intelligently guides the training process, optimizing data selection for model improvement over random sampling. This thesis investigates various aspects of active learning algorithms for language mod- els, focusing on model training, data selection, in-context learning and simulation. The thesis is structured along four key publications that tackle these topics respectively. The first publication addresses the effective adaptation of pretrained language models for AL, highlighting the importance of task-specific fine-tuning. The second publication introduces a novel acquisition function, Contrastive Active Learning (CAL), which selects contrastive examples to improve AL performance. The third publication explores active learning principles for in-context learning with large language models, emphasizing the selection of informative demonstrations for few-shot learning. Lastly, the fourth publication critically examines the limitations of simulating AL experiments and pro- poses guidelines for future research. Through these contributions, this thesis aims to advance our understanding of AL algorithms for data-efficient language model training.
NeurIPS
🏆 Best Paper

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, Bertie Vidgen He He, and Scott A. Hale

In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks. 2024

Abstract arXiv HTML Blog TL;DR

Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design.
EMNLP-Findings

Active Learning Principles for In-Context Learning with Large Language Models

Katerina Margatina, Timo Schick, Nikolaos Aletras, and Jane Dwivedi-Yu

In Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023 2023

Abstract arXiv TL;DR

The remarkable advancements in large language models (LLMs) have significantly enhanced the performance in few-shot learning settings. By using only a small number of labeled examples, referred to as demonstrations, LLMs can effectively grasp the task at hand through in-context learning. However, the process of selecting appropriate demonstrations has received limited attention in prior work. This paper addresses the issue of identifying the most informative demonstrations for few-shot learning by approaching it as a pool-based Active Learning (AL) problem over a single iteration. Our objective is to investigate how AL algorithms can serve as effective demonstration selection methods for in-context learning. We compare various standard AL algorithms based on uncertainty, diversity, and similarity, and consistently observe that the latter outperforms all other methods, including random sampling. Notably, uncertainty sampling, despite its success in conventional supervised learning scenarios, performs poorly in this context. Our extensive experimentation involving a diverse range of GPT and OPT models across 24 classification and multi-choice tasks, coupled with thorough analysis, unambiguously demonstrates that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
ACL-Findings

On the Limitations of Simulating Active Learning

Katerina Margatina, and Nikolaos Aletras

In Findings of the Association for Computational Linguistics (ACL) 2023

Abstract arXiv HTML PDF

Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation, aiming to improve over random sampling. However, performing AL experiments with human annotations on-the-fly is a laborious and expensive process, thus unrealistic for academic research. An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data. In this position paper, we first survey recent literature and highlight the challenges across all different steps within the AL loop. We further unveil neglected caveats in the experimental setup that can significantly affect the quality of AL research. We continue with an exploration of how the simulation setting can govern empirical findings, arguing that it might be one of the answers behind the ever posed question “why do active learning algorithms sometimes fail to outperform random sampling?”. We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data. We believe it is essential to collectively shape the best practices for AL research, particularly as engineering advancements in LLMs push the research focus towards data-driven approaches (e.g., data efficiency, alignment, fairness). In light of this, we have developed guidelines for future work. Our aim is to draw attention to these limitations within the community, in the hope of finding ways to address them.
EMNLP
✨ Oral ✨

Active Learning by Acquiring Contrastive Examples

Katerina Margatina, Giorgos Vernikos, Loïc Barrault, and Nikolaos Aletras

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021

Abstract arXiv HTML PDF TL;DR Code Poster Slides

Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textitcontrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.