User Simulation

Evaluating interactive intelligent systems

FnTIR book CIKM'23 tutorial AAAI'24 tutorial WWW'24 tutorial

Tutorial on User Simulation for Evaluating Interactive Intelligent Systems

Half-day tutorial given at the 38th Annual AAAI Conference on Artificial Intelligence (AAAI ‘24), Vancouver, Canada, February 2024.

As AI technologies are increasingly deployed in real world applications, notably in the form of search engines, recommender systems, and conversational assistants, how to evaluate such technologies in the context of interactive intelligent system applications has emerged as an urgent challenge for both practitioners who deploy AI products and researchers. Research communities have so far mostly relied on test collections to perform reproducible experiments, but such an evaluation methodology cannot be used to evaluate interactive intelligent systems, whose utility must be assessed by users via interactions with the system. To tackle this challenge, researchers have proposed and developed an evaluation methodology based on user simulation, where the idea is to simulate a real user using an intelligent agent that can mimic a user’s decisions when interacting with an AI system and evaluate an AI system by having the system interact with such an artificial user and measuring the perceived utility and cost/effort from a user’s perspective for finishing a task. The work on user simulation for evaluating intelligent systems has so far been mostly done in applied AI communities, notably Information Retrieval, Recommender Systems, and World Wide Web. The goal of this tutorial is to provide a systematic review of this topic and discuss many interesting novel AI-related research challenges for AAAI attendants, allowing them to learn about the major ideas, frameworks, models, and algorithms for both building user simulators and using simulators to evaluate an interactive system, as well as important future research directions.


This tutorial focuses on providing a thorough understanding of user simulation techniques designed specifically for evaluation purposes. We systematically review major research progress, covering both general frameworks for designing user simulators, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants. We also highlight the connections of user simulation with multiple sub-fields of AI (e.g., Reinforcement Learning, Data Augmentation, Agent-based Modeling, and Cognitive Modeling/Systems) and discuss some important future research directions.

Target Audience and Prerequisites

This introductory tutorial primarily targets graduate students, academic researchers, and industry practitioners who are interested in learning about how AI techniques can be applied to build user simulators and/or how user simulation can be used to evaluate interactive AI systems. Since the question of how to accurately evaluate interactive intelligent systems is important to both practitioners who would like to assess the utility of their product systems and researchers who would like to know whether their new algorithms are truly more effective than the existing ones, we expect our tutorial to be broadly appealing to many participants of AAAI.

A general background in AI is sufficient. We expect that most AAAI participants would not have problems with following the material.

Scope and Outline


Time Topic
08:30 - 08:50 Introduction and Background
08:50 - 09:05 Overview of User Simulation
09:05 - 09:30 Simulation-based Evaluation Frameworks
09:30 - 09:45 User Simulation and Human Decision-making
09:45 - 10:30 Simulating Interactions with Search and Recommender Systems
10:30 - 11:00 BREAK
11:00 - 11:30 Simulating Interactions with Conversational Assistants
11:30 - 12:00 User Simulation as an Interdisciplinary Research Area
12:00 - 12:15 Conclusion and Future Challenges
12:15 - 12:30 Discussion


Krisztian Balog is a full professor at the University of Stavanger and a staff research scientist at Google. His general research interests lie in the use and development of information retrieval, natural language processing, and machine learning techniques for intelligent information access tasks. His current research concerns novel evaluation methodologies, and conversational and explainable search and recommendation methods. Balog regularly serves on the senior programme committee of SIGIR, WSDM, WWW, CIKM, and ECIR. He previously served as general co-chair of ICTIR’20 and ECIR’22, program committee co-chair of ICTIR’19 (full papers) and CIKM’21 (short papers), Associate Editor of ACM Transactions on Information Systems, and coordinator of IR benchmarking efforts at TREC and CLEF. Balog is the recipient of the 2018 Karen Spärck Jones Award. He has previously given tutorials at WWW’13, SIGIR’13, WSDM’14, ECIR’16, SIGIR’19, and CIKM’23.

ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include intelligent information retrieval, text mining, natural language processing, machine learning, and their applications. He serves as a Senior Associate Editor of ACM Transactions on Intelligent Systems and Technology and previously served as Associate Editors of ACM TOIS, ACM TKDD, and Elsevier’s IPM, and Program Co-Chair of NAACL-HLT’07, SIGIR’09, and WWW’15. He is an ACM Fellow and a member of the ACM SIGIR Academy. He received the ACM SIGIR Gerard Salton Award and ACM SIGIR Test of Time Award (three times). He has previously given tutorials at HLT-NAACL’04, SIGIR’05, SIGIR’06, HLT-NAACL’07, ICTIR’13, SIGIR’14, KDD’17, SIGIR’17, SIGIR’18, SIGIR’20, SIGIR’21, and CIKM’23.