Theory and Toolkits for User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Half-day tutorial at the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘25), Padua, Italy, July 2025.

Interactive AI systems, including search engines, recommender systems, conversational agents, and generative AI applications, are increasingly central to user experiences. However, rigorously evaluating their performance, training them effectively with interaction data, and modeling user behavior for personalization remain significant challenges, often difficult to address reproducibly and at scale. User simulation, which employs intelligent agents to mimic human interaction patterns, offers a powerful and versatile methodology to tackle these interconnected issues. This half-day tutorial provides a comprehensive overview of modern user simulation techniques for interactive AI systems. We will explore the theoretical foundations and practical applications of simulation for system evaluation, algorithm training, and user modeling, emphasizing the crucial connections between these uses. The tutorial covers key simulation methodologies, with a particular focus on recent advancements leveraging large language models, discussing both the opportunities they present and the open challenges they entail. Crucially, we will also provide practical guidance, highlighting relevant toolkits, libraries, and datasets available to researchers and practitioners.

Objectives

This tutorial aims to equip participants with a solid understanding of the goals, underlying principles, and diverse applications of user simulation within interactive AI, spanning system evaluation, model training, and user modeling. We will cover essential theoretical background, specifically highlighting the connections between these different simulation uses.

The tutorial will provide an overview of key simulation methodologies, paying particular attention to the newest generation of approaches that leverage large language models (LLMs). Furthermore, we will discuss practical resources available to researchers and practitioners, including relevant datasets and software toolkits, to facilitate the adoption of these techniques.

%Upon completion, participants will grasp the core concepts behind user simulation, be familiar with state-of-the-art methods (including LLM-based approaches), know where to find practical resources, and understand how simulation can be effectively applied to address key challenges in the development and analysis of interactive AI systems.

Target Audience and Prerequisites

This introductory tutorial primarily targets graduate students, academic researchers, and industry practitioners working on information access or, more broadly, interactive AI systems. We expect our tutorial to be broadly appealing to participants of SIGIR.

Scope and Outline

Background, Motivation, and Context

We introduce the landscape of modern interactive AI systems, encompassing search engines, recommender systems, conversational agents, and the rapidly evolving domain of Generative AI. We position simulation as a versatile methodology, grounded in user understanding, capable of addressing the interconnected challenges of evaluation, training, and modeling for today’s complex AI, contrasting its role with traditional methods and highlighting its increasing relevance. We briefly discuss historical roots and explorations across various communities: early work investigated its potential in information retrieval, while in dialogue systems, simulation has been employed for tasks like policy learning. Fundamentally, user simulation involves creating operational user models, drawing inspiration from decades of research into search intent, information seeking behavior, cognitive modeling, and even economic IR models.
Foundations of User Simulation: Behavior Models, Formalisms, and Metrics

This section establishes the theoretical groundwork for user simulation. We begin with a high-level overview of influential conceptual models of human information-seeking behavior, categorized as cognitive models, process models, and strategic models, which provide guidance for modeling user decisions and processes. We then introduce Markov Decision Processes (MDPs) as a general mathematical framework for formalizing sequential decision-making, widely used for constructing user simulators. Building on this foundation, we demonstrate how simulation principles underpin even traditional IR evaluation, interpreting metrics like Precision, Recall, and NDCG@k as representing implicit, naive user models. Finally, we briefly discuss metrics explicitly derived from simulating user behavior based on assumptions about tasks, interaction patterns, perceived rewards.
Simulating Interactions

This section covers the fundamental approaches to modeling user interactions for simulation. We start with models describing interaction workflows (specifying actions, responses, transitions). We then delve into modeling specific user actions common in search, such as query formulation, result scanning, clicks, perceived effort, and stopping decisions. We extend this to conversational contexts, discussing conceptualizations based on intents and dialogue structure. We cover different simulator architectures (modular vs. end-to-end) and established techniques from dialogue systems like agenda-based simulation and sequence-to-sequence models, alongside approaches specific to conversational information access.
Simulation Toolkits and Frameworks

Building on the techniques discussed, this section surveys available open-source toolkits, libraries, and frameworks designed to facilitate the practical implementation of user simulators. We will cover resources relevant for both traditional search/recommendation interaction and conversational systems. For each key resource, we will discuss its architecture, scope (models supported, tasks targeted), ease of use, extensibility, and associated datasets or dependencies, aiming to provide attendees with practical starting points for implementing simulators.
Simulator Quality: Validation Principles and Methods

This section addresses the crucial topic of validation of simulators. We define key requirements for effective simulators, such as validity, fidelity, and interpretability. We then introduce and discuss common validation methodologies used to assess these properties. These include quantitative comparisons against real user interaction logs (e.g., matching distributions of actions, sequence similarity analysis), sensitivity analysis of simulator parameters, task-based evaluation (assessing if downstream conclusions hold), and qualitative assessments. We also touch upon the inherent challenges in validating complex simulated behavior.
Resources for Validation: Benchmarks and Protocols

Complementing the discussion of validation methods, this section focuses on the practical resources available for carrying out simulator validation. We will identify and discuss standard benchmark datasets that contain real user interaction logs (e.g., from TREC tracks, public search/session logs, dialogue corpora) suitable for comparative validation studies. We will cover established validation protocols, common metrics used, and recent work on platforms dedicated to the evaluation of simulation approaches.
Conclusion and Future Challenges

We conclude by highlighting open issues and providing several potential research directions. We discuss how simulation technologies can help foster collaboration between academia and industry. We also argue that some of the major challenges that remain require research from multiple subject areas, including information science, information retrieval, recommender systems, machine learning, natural language processing, knowledge representation, human-computer interaction, and psychology, making user simulation a truly interdisciplinary area for research.

Presenters

Krisztian Balog is a full professor at the University of Stavanger and a staff research scientist at Google DeepMind. His general research interests lie in the use and development of information retrieval, natural language processing, and machine learning techniques for intelligent information access tasks. His current research concerns novel evaluation methodologies, conversational information access, user modeling, transparency, and explainability. Balog regularly serves on the senior programme committee of SIGIR, WSDM, WWW, CIKM, and ECIR. He previously served as general co-chair of ICTIR’20 and ECIR’22, program committee co-chair of ICTIR’19 (full papers), CIKM’21 (short papers), and SIGIR’24 (resource and reproducibility), Associate Editor of ACM Transactions on Information Systems, and coordinator of IR benchmarking efforts at TREC and CLEF. Balog is the recipient of the 2018 Karen Spärck Jones Award. He has previously given tutorials at WWW’13, SIGIR’13, WSDM’14, ECIR’16, SIGIR’19, CIKM’23, AAAI’24, and WWW’24.

Nolwenn Bernard is a final-year PhD student at the University of Stavanger, Norway. Her research specifically focuses on the use of user simulation for the development and evaluation of conversational information access systems. Part of her work involves the development of resources to make user simulation more accessible to the community and support future research in this area. She has published papers at SIGIR, ICTIR, WSDM, and CUI.

Saber Zerhoudi is a final-year PhD student at the University of Passau, Germany. His research focuses on simulating and evaluating user search behavior with interactive information retrieval systems, with applications extending to digital libraries contexts. He has published his work at conferences including SIGIR, CIKM, ECIR, CHIIR, JCDL, and SIGIR-AP. He is one of the main authors behind the SimIIRv2 and SimIIRv3 toolkits.

ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include intelligent information retrieval, text mining, natural language processing, machine learning, and their applications. He serves as a Senior Associate Editor of ACM Transactions on Intelligent Systems and Technology and previously served as Associate Editors of ACM TOIS, ACM TKDD, and Elsevier’s IPM, and Program Co-Chair of NAACL-HLT’07, SIGIR’09, and WWW’15. He is an ACM Fellow and a member of the ACM SIGIR Academy. He received the ACM SIGIR Gerard Salton Award and ACM SIGIR Test of Time Award (three times). He has previously given tutorials at HLT-NAACL’04, SIGIR’05, SIGIR’06, HLT-NAACL’07, ICTIR’13, SIGIR’14, KDD’17, SIGIR’17, SIGIR’18, SIGIR’20, SIGIR’21, CIKM’23, AAAI’24, SIGIR-AP’24, and WWW’24.