User Simulation

Evaluating interactive intelligent systems

FnTIR book CIKM'23 tutorial AAAI'24 tutorial WWW'24 tutorial

Tutorial on User Simulation for Evaluating Information Access Systems on the Web @TheWebConf’24

Half-day tutorial at the 2024 ACM Web Conference, Singapore, May 2024.

Although Web information access systems, such as search engines, recommender systems, and conversational assistants, are used by millions on a daily basis, how to appropriately evaluate those systems remains an open scientific challenge. For example, the weak correlation of online and offline evaluation results makes it hard to choose the best algorithm to deploy in a production environment, while inaccurate evaluation of algorithms would result in misleading conclusions and thus hinder progress in research.

The emergence of large language models (LLMs) such as ChatGPT makes information access on the Web increasingly more interactive and conversational. It is especially challenging to evaluate an interactive system’s overall effectiveness in helping a user finish a task via interactive support, because the utility of such a system can only be assessed by a user interacting with the system. Moreover, the fact that users vary significantly in terms of their behaviour and preferences makes it very difficult to perform system evaluation with reproducible experiments.

User simulation has the potential to enable repeatable and reproducible evaluations at low cost, without using invaluable user time (human assessor time or online experimentation bandwidth). Further, simulation can augment traditional evaluation methodologies by providing insights into how system performance changes under different conditions and user behaviour. Relevant research work, however, is scattered across multiple research communities, including information retrieval, recommender systems, dialogue systems, and user modeling. This tutorial aims to synthesize this extensive body of research into a coherent framework with a focus on applications of user simulation to evaluate Web information access systems.

Target Audience and Prerequisites

Since the question of how to accurately evaluate a search engine, a recommender system, or a conversational assistant is important to both practitioners who would like to assess the utility of their product systems and researchers who would like to know whether their new algorithms are truly more effective than the existing ones, we expect our tutorial to be broadly appealing to many participants of the Web Conference, including undergraduate and graduate students, academic and industry researchers, practitioners from the industry, and government policy/decision makers. As the tutorial is mostly self-contained with only minimum pre-required background knowledge, it is expected to be accessible to most attendants of the Web Conference.

Participants of the tutorial can expect to learn what user simulation is, why it is important to use it for evaluation, how existing user simulation techniques can already be useful for evaluating interactive Web information access systems, how to develop new user simulators, and how to use user simulation broadly to evaluate assistive AI systems. They can also expect to learn about associated challenges and where additional research is still needed.

Scope and Outline


Krisztian Balog is a full professor at the University of Stavanger and a staff research scientist at Google. His general research interests lie in the use and development of information retrieval, natural language processing, and machine learning techniques for intelligent information access tasks. His current research concerns novel evaluation methodologies, and conversational and explainable search and recommendation methods. Balog regularly serves on the senior programme committee of SIGIR, WSDM, WWW, CIKM, and ECIR. He previously served as general co-chair of ICTIR’20 and ECIR’22, program committee co-chair of ICTIR’19 (full papers), CIKM’21 (short papers), and SIGIR’24 (resource and reproducibility), Associate Editor of ACM Transactions on Information Systems, and coordinator of IR benchmarking efforts at TREC and CLEF. Balog is the recipient of the 2018 Karen Spärck Jones Award. He has previously given tutorials at WWW’13, SIGIR’13, WSDM’14, ECIR’16, SIGIR’19, CIKM’23, and AAAI’24.

ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include intelligent information retrieval, text mining, natural language processing, machine learning, and their applications. He serves as a Senior Associate Editor of ACM Transactions on Intelligent Systems and Technology and previously served as Associate Editors of ACM TOIS, ACM TKDD, and Elsevier’s IPM, and Program Co-Chair of NAACL-HLT’07, SIGIR’09, and WWW’15. He is an ACM Fellow and a member of the ACM SIGIR Academy. He received the ACM SIGIR Gerard Salton Award and ACM SIGIR Test of Time Award (three times). He has previously given tutorials at HLT-NAACL’04, SIGIR’05, SIGIR’06, HLT-NAACL’07, ICTIR’13, SIGIR’14, KDD’17, SIGIR’17, SIGIR’18, SIGIR’20, SIGIR’21, CIKM’23, and AAAI’24.