Tutorial on User Simulation for Evaluating Information Access Systems
Half-day tutorial given at the 31st ACM International Conference on Information and Knowledge Management (CIKM ‘23), Birmingham, UK, October 2023. The same tutorial was also given at the 1st International ACM SIGIR Conference on Information Retrieval in the Asia Pacific (SIGIR-AP ‘23), Beijing, China, November 2023.
With the emergence of various information access systems exhibiting increasing complexity, there is a critical need for sound and scalable means of automatic evaluation. To address this challenge, user simulation emerges as a promising solution. This half-day tutorial focuses on providing a thorough understanding of user simulation techniques designed specifically for evaluation purposes. We systematically review major research progress, covering both general frameworks for designing user simulators, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants. We also highlight some important future research directions.
Objectives
The main objective of this tutorial is to provide a systematic overview of research progress in user simulation from the perspective of evaluating information access systems. We will synthesize scattered research work from multiple research communities on this topic by using general theoretical frameworks, which would allow the participants to see how multiple lines of research are connected, the major high-level issues in user simulation, and the general idea of using simulation to evaluate information access systems. We will also systematically cover many specific simulation techniques, with a focus on those that may be employed to undertake evaluation of three major types of information access systems, specifically, search engines, recommender system, and conversational assistants, in order to (1) estimate how well they will perform under various circumstances, and (2) analyze how performance changes under different conditions and user behaviors. We will make our discussions as generic as possible so that those working on other types of interactive systems or applications of assistive AI would also find it useful.
Participants of the tutorial can expect to learn what user simulation is, why it is important to use user simulation for evaluation, how existing user simulation techniques can already be useful for evaluating interactive information retrieval systems, how to develop new user simulators, and how to use user simulation to evaluate almost any assistive AI system. They can also expect to learn why user simulation is very challenging and where additional research is still needed.
Target Audience and Prerequisites
This introductory tutorial primarily targets graduate students, academic researchers, and industry practitioners working on information access or, more broadly, interactive AI systems. Since the question of how to accurately evaluate a search engine, a recommender system, or a conversational assistant is important to both practitioners who would like to assess the utility of their product systems and researchers who would like to know whether their new algorithms are truly more effective than the existing ones, we expect our tutorial to be broadly appealing to many participants of CIKM. A general background in IR is sufficient. We expect that most CIKM participants would not have problems with following the material.
Scope and Outline
-
Introduction and Background [20 min]
We first describe the spectrum of information access tasks. Next, we briefly discuss the goals of evaluation and general methodologies of evaluation (reusable test collections, user studies, and online evaluation). We then highlight the challenges involved in evaluating information access systems and how user simulation can help address those challenges.
-
Overview of User Simulation [15 min]
This part provides a brief historical account on the use of simulation techniques, and highlight how various research communities focused on different but complementary areas of evaluation and user simulation. This includes early work on simulation in information retrieval and studies in interactive information retrieval pointing out discrepancies between interactive and non-interactive evaluation results. In dialogue systems research, simulation-based techniques have been used for dialogue policy learning, and to a limited extent also for evaluation. User simulation can be regarded as developing a complete and operational user model, which makes work on search tasks and intent, information seeking models, cognitive models of users, and economic IR models highly relevant to us.
-
Simulation-based Evaluation Frameworks [25 min]
We make the key observation that traditional evaluation measures used in IR may be viewed as naive user simulators, and discuss how to interpret Precision, Recall, and NDCG@k from an user simulation perspective. Next, we discuss metrics based on explicit models of user behavior, based on (1) the assumed user task, (2) the assumed user behavior when interacting with results, (3) the measurement of the reward a user would receive from examining a result, and (4) the measurement of the effort a user would need to make in order to receive the reward. Specifically, we cover the RBP, ERR, EBU, and the time-biased gain measures, as well as the more general frameworks of C/W/L, C/W/L/A, and the model-based framework by Carterette. Finally, we present a general simulation-based evaluation framework and the Interface Card Model, which can be used to evaluate an interactive information access system with a computationally generated dynamic browsing interface using user simulation.
-
User Simulation and Human Decision-making [15 min]
In this part, we provide a high-level overview of research on conceptual models that can provide theoretical guidance for modeling processes and decisions from an individual’s perspective. We cover models of search behavior within three main categories: (1) cognitive models, focusing on the cognitive processes underlying the information-seeking activity, (2) process models, representing the different stages and activities during the search process, and (3) strategic models, describing tactics that users employ when searching for information. Then, we discuss how to model decision-making processes mathematically using Markov decision processes (MDP). The MDP framework provides a general formal framework for constructing user simulators, which we will use to discuss specific user simulation techniques in the next two sections.
-
Simulating Interactions with Search and Recommender Systems [45 min]
We start by presenting models that describe interaction workflows, that is, specify the space of user actions and system responses, and possible transitions between them. Then, we discuss specific user actions: query formulation, scanning behavior, clicks, effort involved in processing documents, and stopping. We also provide an overview of toolkits and resources and discuss approaches to validating simulators.
-
Simulating Interactions with Conversational Assistants [30 min]
We begin with a conceptualization of conversational information access in terms of intents and dialogue structure, and discuss two fundamentally different simulator architectures: modular and end-to-end systems. There is a solid body of work within dialogue systems research on simulating user decisions to build on, including the widely used agenda-based simulation and more recent sequence-to-sequence models. This is followed by the discussion of simulation approaches developed specifically for conversational information access. We review toolkits and resources, followed by a discussion on how simulators themselves can be evaluated.
-
Conclusion and Future Challenges [15 min]
We conclude by highlighting open issues and providing several potential research directions. We discuss how simulation technologies can help foster collaboration between academia and industry. We also argue that some of the major challenges that remain require research from multiple subject areas, including information science, information retrieval, recommender systems, machine learning, natural language processing, knowledge representation, human-computer interaction, and psychology, making user simulation a truly interdisciplinary area for research.
-
Discussion [15 min]
We dedicate the last bit of the tutorial to open-ended discussion and feedback from participants.
Presenters
Krisztian Balog is a full professor at the University of Stavanger and a staff research scientist at Google. His general research interests lie in the use and development of information retrieval, natural language processing, and machine learning techniques for intelligent information access tasks. His current research concerns novel evaluation methodologies, and conversational and explainable search and recommendation methods. Balog regularly serves on the senior programme committee of SIGIR, WSDM, WWW, CIKM, and ECIR. He previously served as general co-chair of ICTIR’20 and ECIR’22, program committee co-chair of ICTIR’19 (full papers) and CIKM’21 (short papers), Associate Editor of ACM Transactions on Information Systems, and coordinator of IR benchmarking efforts at TREC and CLEF. Balog is the recipient of the 2018 Karen Spärck Jones Award. He has previously given tutorials at WWW’13, SIGIR’13, WSDM’14, ECIR’16, and SIGIR’19.
ChengXiang Zhai is a Donald Biggar Willett Professor in Engineering of Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests include intelligent information retrieval, text mining, natural language processing, machine learning, and their applications. He serves as a Senior Associate Editor of ACM Transactions on Intelligent Systems and Technology and previously served as Associate Editors of ACM TOIS, ACM TKDD, and Elsevier’s IPM, and Program Co-Chair of NAACL-HLT’07, SIGIR’09, and WWW’15. He is an ACM Fellow and a member of the ACM SIGIR Academy. He received the ACM SIGIR Gerard Salton Award and ACM SIGIR Test of Time Award (three times). He has previously given tutorials at HLT-NAACL’04, SIGIR’05, SIGIR’06, HLT-NAACL’07, ICTIR’13, SIGIR’14, KDD’17, SIGIR’17, SIGIR’18, SIGIR’20, and SIGIR’21.