User Simulation

Building Intelligent Agents that Mimic Human Behaviour

FnTIR book CIA bibliography Toolkits Tutorials

Theory and Toolkits for User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Half-day tutorial at the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘25), Padua, Italy, July 2025.

Interactive AI systems, including search engines, recommender systems, conversational agents, and generative AI applications, are increasingly central to user experiences. However, rigorously evaluating their performance, training them effectively with interaction data, and modeling user behavior for personalization remain significant challenges, often difficult to address reproducibly and at scale. User simulation, which employs intelligent agents to mimic human interaction patterns, offers a powerful and versatile methodology to tackle these interconnected issues. This half-day tutorial provides a comprehensive overview of modern user simulation techniques for interactive AI systems. We will explore the theoretical foundations and practical applications of simulation for system evaluation, algorithm training, and user modeling, emphasizing the crucial connections between these uses. The tutorial covers key simulation methodologies, with a particular focus on recent advancements leveraging large language models, discussing both the opportunities they present and the open challenges they entail. Crucially, we will also provide practical guidance, highlighting relevant toolkits, libraries, and datasets available to researchers and practitioners.

Objectives

This tutorial aims to equip participants with a solid understanding of the goals, underlying principles, and diverse applications of user simulation within interactive AI, spanning system evaluation, model training, and user modeling. We will cover essential theoretical background, specifically highlighting the connections between these different simulation uses.

The tutorial will provide an overview of key simulation methodologies, paying particular attention to the newest generation of approaches that leverage large language models (LLMs). Furthermore, we will discuss practical resources available to researchers and practitioners, including relevant datasets and software toolkits, to facilitate the adoption of these techniques.

%Upon completion, participants will grasp the core concepts behind user simulation, be familiar with state-of-the-art methods (including LLM-based approaches), know where to find practical resources, and understand how simulation can be effectively applied to address key challenges in the development and analysis of interactive AI systems.

Target Audience and Prerequisites

This introductory tutorial primarily targets graduate students, academic researchers, and industry practitioners working on information access or, more broadly, interactive AI systems. We expect our tutorial to be broadly appealing to participants of SIGIR.

Scope and Outline

Presenters

Krisztian Balog is a full professor at the University of Stavanger and a staff research scientist at Google DeepMind. His general research interests lie in the use and development of information retrieval, natural language processing, and machine learning techniques for intelligent information access tasks. His current research concerns novel evaluation methodologies, conversational information access, user modeling, transparency, and explainability. Balog regularly serves on the senior programme committee of SIGIR, WSDM, WWW, CIKM, and ECIR. He previously served as general co-chair of ICTIR’20 and ECIR’22, program committee co-chair of ICTIR’19 (full papers), CIKM’21 (short papers), and SIGIR’24 (resource and reproducibility), Associate Editor of ACM Transactions on Information Systems, and coordinator of IR benchmarking efforts at TREC and CLEF. Balog is the recipient of the 2018 Karen Spärck Jones Award. He has previously given tutorials at WWW’13, SIGIR’13, WSDM’14, ECIR’16, SIGIR’19, CIKM’23, AAAI’24, and WWW’24.

Nolwenn Bernard is a final-year PhD student at the University of Stavanger, Norway. Her research specifically focuses on the use of user simulation for the development and evaluation of conversational information access systems. Part of her work involves the development of resources to make user simulation more accessible to the community and support future research in this area. She has published papers at SIGIR, ICTIR, WSDM, and CUI.

Saber Zerhoudi is a final-year PhD student at the University of Passau, Germany. His research focuses on simulating and evaluating user search behavior with interactive information retrieval systems, with applications extending to digital libraries contexts. He has published his work at conferences including SIGIR, CIKM, ECIR, CHIIR, JCDL, and SIGIR-AP. He is one of the main authors behind the SimIIRv2 and SimIIRv3 toolkits.

ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include intelligent information retrieval, text mining, natural language processing, machine learning, and their applications. He serves as a Senior Associate Editor of ACM Transactions on Intelligent Systems and Technology and previously served as Associate Editors of ACM TOIS, ACM TKDD, and Elsevier’s IPM, and Program Co-Chair of NAACL-HLT’07, SIGIR’09, and WWW’15. He is an ACM Fellow and a member of the ACM SIGIR Academy. He received the ACM SIGIR Gerard Salton Award and ACM SIGIR Test of Time Award (three times). He has previously given tutorials at HLT-NAACL’04, SIGIR’05, SIGIR’06, HLT-NAACL’07, ICTIR’13, SIGIR’14, KDD’17, SIGIR’17, SIGIR’18, SIGIR’20, SIGIR’21, CIKM’23, AAAI’24, SIGIR-AP’24, and WWW’24.