Tutorial on User Simulation for Evaluating Interactive Intelligent Systems
Half-day tutorial given at the 38th Annual AAAI Conference on Artificial Intelligence (AAAI ‘24), Vancouver, Canada, February 2024.
As AI technologies are increasingly deployed in real world applications, notably in the form of search engines, recommender systems, and conversational assistants, how to evaluate such technologies in the context of interactive intelligent system applications has emerged as an urgent challenge for both practitioners who deploy AI products and researchers. Research communities have so far mostly relied on test collections to perform reproducible experiments, but such an evaluation methodology cannot be used to evaluate interactive intelligent systems, whose utility must be assessed by users via interactions with the system. To tackle this challenge, researchers have proposed and developed an evaluation methodology based on user simulation, where the idea is to simulate a real user using an intelligent agent that can mimic a user’s decisions when interacting with an AI system and evaluate an AI system by having the system interact with such an artificial user and measuring the perceived utility and cost/effort from a user’s perspective for finishing a task. The work on user simulation for evaluating intelligent systems has so far been mostly done in applied AI communities, notably Information Retrieval, Recommender Systems, and World Wide Web. The goal of this tutorial is to provide a systematic review of this topic and discuss many interesting novel AI-related research challenges for AAAI attendants, allowing them to learn about the major ideas, frameworks, models, and algorithms for both building user simulators and using simulators to evaluate an interactive system, as well as important future research directions.
Outline
This tutorial focuses on providing a thorough understanding of user simulation techniques designed specifically for evaluation purposes. We systematically review major research progress, covering both general frameworks for designing user simulators, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants. We also highlight the connections of user simulation with multiple sub-fields of AI (e.g., Reinforcement Learning, Data Augmentation, Agent-based Modeling, and Cognitive Modeling/Systems) and discuss some important future research directions.
Target Audience and Prerequisites
This introductory tutorial primarily targets graduate students, academic researchers, and industry practitioners who are interested in learning about how AI techniques can be applied to build user simulators and/or how user simulation can be used to evaluate interactive AI systems. Since the question of how to accurately evaluate interactive intelligent systems is important to both practitioners who would like to assess the utility of their product systems and researchers who would like to know whether their new algorithms are truly more effective than the existing ones, we expect our tutorial to be broadly appealing to many participants of AAAI.
A general background in AI is sufficient. We expect that most AAAI participants would not have problems with following the material.
Scope and Outline
-
Introduction and Background [20 min]
We first describe the spectrum of information access tasks. Next, we briefly discuss the goals of evaluation and general methodologies of evaluation (reusable test collections, user studies, and online evaluation). We then highlight the challenges involved in evaluating information access systems and how user simulation can help address those challenges.
-
Overview of User Simulation [15 min]
This part provides a brief historical account on the use of simulation techniques, and highlight how various research communities focused on different but complementary areas of evaluation and user simulation. This includes early work on simulation in information retrieval and studies in interactive information retrieval pointing out discrepancies between interactive and non-interactive evaluation results. In dialogue systems research, simulation-based techniques have been used for dialogue policy learning, and to a limited extent also for evaluation. User simulation can be regarded as developing a complete and operational user model, which makes work on search tasks and intent, information seeking models, cognitive models of users, and economic IR models highly relevant to us.
-
Simulation-based Evaluation Frameworks [25 min]
We make the key observation that traditional evaluation measures used in IR may be viewed as naive user simulators, and discuss how to interpret Precision, Recall, and NDCG@k from an user simulation perspective. Next, we discuss metrics based on explicit models of user behavior, based on (1) the assumed user task, (2) the assumed user behavior when interacting with results, (3) the measurement of the reward a user would receive from examining a result, and (4) the measurement of the effort a user would need to make in order to receive the reward. Specifically, we cover the RBP, ERR, EBU, and the time-biased gain measures, as well as the more general frameworks of C/W/L, C/W/L/A, and the model-based framework by Carterette. Finally, we present a general simulation-based evaluation framework and the Interface Card Model, which can be used to evaluate an interactive information access system with a computationally generated dynamic browsing interface using user simulation.
-
User Simulation and Human Decision-making [15 min]
In this part, we provide a high-level overview of research on conceptual models that can provide theoretical guidance for modeling processes and decisions from an individual’s perspective. We cover models of search behavior within three main categories: (1) cognitive models, focusing on the cognitive processes underlying the information-seeking activity, (2) process models, representing the different stages and activities during the search process, and (3) strategic models, describing tactics that users employ when searching for information. Then, we discuss how to model decision-making processes mathematically using Markov decision processes (MDP). The MDP framework provides a general formal framework for constructing user simulators, which we will use to discuss specific user simulation techniques in the next two sections.
-
Simulating Interactions with Search and Recommender Systems [45 min]
We start by presenting models that describe interaction workflows, that is, specify the space of user actions and system responses, and possible transitions between them. Then, we discuss specific user actions: query formulation, scanning behavior, clicks, effort involved in processing documents, and stopping. We also provide an overview of toolkits and resources and discuss approaches to validating simulators.
-
Simulating Interactions with Conversational Assistants [30 min]
We begin with a conceptualization of conversational information access in terms of intents and dialogue structure, and discuss two fundamentally different simulator architectures: modular and end-to-end systems. There is a solid body of work within dialogue systems research on simulating user decisions to build on, including the widely used agenda-based simulation and more recent sequence-to-sequence models. This is followed by the discussion of simulation approaches developed specifically for conversational information access. We review toolkits and resources, followed by a discussion on how simulators themselves can be evaluated.
-
User Simulation as an Interdisciplinary Research Area [30 min]
In this part, we make connections to different sub-fields of AI, where AI techniques can be used to improve user simulation or where user simulation may be used to enhance AI research, including, e.g., Machine Learning, especially Reinforcement Learning, Data Augmentation, Agent-Based Modeling, User Modeling, and Human-Computer Interaction.
-
Conclusion and Future Challenges [15 min]
We conclude by highlighting open issues and providing several potential research directions. We discuss how simulation technologies can help foster collaboration between academia and industry. We also argue that some of the major challenges that remain require research from multiple subject areas, including Information Science, Information Retrieval, Recommender Systems, Machine Learning, Natural Language Processing, Knowledge Representation, Human-Computer Interaction, and Psychology, making user simulation a truly interdisciplinary area for research.
-
Discussion [15 min]
We dedicate the last bit of the tutorial to open-ended discussion and feedback from participants.
Schedule
Time | Topic |
---|---|
08:30 - 08:50 | Introduction and Background |
08:50 - 09:05 | Overview of User Simulation |
09:05 - 09:30 | Simulation-based Evaluation Frameworks |
09:30 - 09:45 | User Simulation and Human Decision-making |
09:45 - 10:30 | Simulating Interactions with Search and Recommender Systems |
10:30 - 11:00 | BREAK |
11:00 - 11:30 | Simulating Interactions with Conversational Assistants |
11:30 - 12:00 | User Simulation as an Interdisciplinary Research Area |
12:00 - 12:15 | Conclusion and Future Challenges |
12:15 - 12:30 | Discussion |
Presenters
Krisztian Balog is a full professor at the University of Stavanger and a staff research scientist at Google. His general research interests lie in the use and development of information retrieval, natural language processing, and machine learning techniques for intelligent information access tasks. His current research concerns novel evaluation methodologies, and conversational and explainable search and recommendation methods. Balog regularly serves on the senior programme committee of SIGIR, WSDM, WWW, CIKM, and ECIR. He previously served as general co-chair of ICTIR’20 and ECIR’22, program committee co-chair of ICTIR’19 (full papers) and CIKM’21 (short papers), Associate Editor of ACM Transactions on Information Systems, and coordinator of IR benchmarking efforts at TREC and CLEF. Balog is the recipient of the 2018 Karen Spärck Jones Award. He has previously given tutorials at WWW’13, SIGIR’13, WSDM’14, ECIR’16, SIGIR’19, and CIKM’23.
ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include intelligent information retrieval, text mining, natural language processing, machine learning, and their applications. He serves as a Senior Associate Editor of ACM Transactions on Intelligent Systems and Technology and previously served as Associate Editors of ACM TOIS, ACM TKDD, and Elsevier’s IPM, and Program Co-Chair of NAACL-HLT’07, SIGIR’09, and WWW’15. He is an ACM Fellow and a member of the ACM SIGIR Academy. He received the ACM SIGIR Gerard Salton Award and ACM SIGIR Test of Time Award (three times). He has previously given tutorials at HLT-NAACL’04, SIGIR’05, SIGIR’06, HLT-NAACL’07, ICTIR’13, SIGIR’14, KDD’17, SIGIR’17, SIGIR’18, SIGIR’20, SIGIR’21, and CIKM’23.