User Simulation for Conversational Information Access
An annotated bibliography of papers (2020-)
This page is maintained by Nolwenn Bernard and Krisztian Balog. We welcome suggestions via email.
- Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions, Abbasiantaeb et al., WSDM 2024
- TL;DR: Proposes a simulation framework involving two large language models (LLMs) acting as a questioner and an answerer engaged in a conversation. The main objectives are to investigate the effectiveness of LLMs in simulating question-answering conversations and to compare the generated conversations against human-human conversations with regards to various characteristics. The analysis shows that the LLMs tend to generate longer questions and answers than humans, and these provide better coverage of the topic in focus.
- Identifying Breakdowns in Conversational Recommender Systems using User Simulation, Bernard and Balog, CUI 2024
- TL;DR: Proposes a methodology to systematically test conversational recommender systems with regards to conversational breakdowns. It consists of analyzing conversations generated between the system and a user simulator to identify pre-defined types of breakdowns. A case study demonstrates that the methodology can be applied to make an existing conversational recommender system more robust to conversation breakdowns.
- Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access, Bernard and Balog, ICTIR 2024
- TL;DR: Formally characterizes the distinct objectives of user simulators: (1) training aims to maximize behavioral similarity to real users and (2) evaluation focuses on the accurate prediction of real-world conversational agent performance. An empirical study shows that optimizing for one objective does not necessarily lead to improved performance on the other. This finding highlights the need for distinct design considerations during the development of user simulators.
- Concept–An Evaluation Protocol on Conversation Recommender Systems with System-and User-centric Factors, Huang et al., arXiv 2024
- TL;DR: Proposes a new evaluation protocol for conversational recommendation systems that considers both system- and user-centric factors that influence the user experience and engagement. The protocol identifies and defines six abilities that relate to three factors, in addition to corresponding metrics. Some metrics are computed based on scores given by a large language model. The authors apply the protocol to evaluate off-the-self conversational recommender systems and demonstrate its comprehensiveness.
- What Else Would I Like? A User Simulator using Alternatives for Improved Evaluation of Fashion Conversational Recommendation Systems, Vlachou and Macdonald, arXiv 2024
- TL;DR: Proposes a meta user simulator that can provide knowledge on alternative targets in the context of conversational recommendations in fashion. Based on a patience parameter, the target item is replaced by the closest alternative (i.e., the one with the highest visual similarity). The experiments show that it leads to shorter conversations as users are inclined to change their minds and accept an alternative target; another positive consequence is an improved success rate of the conversational recommender system.
- How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation, Zhu et al., arXiv 2024
- TL;DR: Performs an analysis of the limitations of a LLM-based user simulator, iEvaLM, and proposes a new user simulator to mitigate the identified limitations. The analysis of iEvaLM with two datasets reveals that (1) it leads to data leakage which inflate the performances, (2) the simulated responses are not the main factor in getting successful recommendations, and (3) controlling the simulated responses via a single prompt is complex. To address these limitations, the authors propose SimpleUserSim which does not know the name of the target item during the conversation and uses a different prompt for each possible user action. Using the same experimental setting, they show that SimpleUserSim is less prone to data leakage and produces more impactful responses.
- A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems, Zhu et al., arXiv 2024
- TL;DR: Introduces a LLM-based user simulator that is controllable, scalable, and allows human intervention during user profile creation. The dialogue generation process is divided into stages: profile initialization, preference initialization, and message handling. The user behavior for each stage is controlled with specific plugins such as user preferences summary and intent understanding. These plugins can easily be modified, extended, or replaced to change the behavior of the user simulator. The authors perform experiments in two scenarios, with/out human annotation in the dataset, to showcase the adaptability of the user simulator and its ability to effectively simulate user preferences.
- User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue, Davidson et al., arXiv 2023
- TL;DR: Proposes a user simulator built with an LLM using in-context learning instead of fine-tuning. The main objective is to generate linguistically diverse and human-like utterances. The use of goal success rate as a metric to evaluate user simulator is criticized as humans tend to have non-optimal behavior.
- One Cannot Stand for Everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue Systems, Liu et al., ACL 2023
- TL;DR: Proposes to train task-oriented dialogue systems using multiple user simulators. They define the problem as a multi-armed bandit where each arm corresponds to one user simulator. It allows to balance how much each simulator is used during optimisation steps and tackle catastrophic forgetting. The experimental results show that the performances are improved compared to baseline agents trained with a single user simulator in a single domain scenario; the agents trained with their framework are more robust to unseen domains. While the results are promising, they admit that experiments in a multi-domain scenario are needed.
- Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems, Sun et al., arXiv 2023
- TL;DR: Presents a metaphorical user simulator, MetaSim, that uses historical conversation strategies as metaphors for a current conversation. It improves the simulator’s abilities at dialogue reasoning and generalizing to new domains, and its realism. The authors also propose a tester-based evaluation framework to evaluate user simulators and task-oriented dialogue systems; a manual evaluation shows that it is a promising solution for automatic evaluation.
- In-Context Learning User Simulators for Task-Oriented Dialog Systems, Terragni et al., arXiv 2023
- TL;DR: Proposes an approach to build an in-context learning user simulator using a LLM. The user simulator is given a prompt comprising the task description, example dialogues, user goal, and dialogue history to generate responses. It comprises an evaluation component that tracks the goal completion and assesses the system’s actions. The experiments show that the in-context learning abilities of LLMs are valuable to generate diverse dialogues (exploration of many dialogue paths) but suffer from limitations like unpredictability and hallucinations.
- User Behavior Simulation with Large Language Model based Agents, Wang et al., arXiv 2023
- TL;DR: Introduces a simulation environment where agents can interact with a recommender system, other agents, and “social media”. An agent is based on a LLM (ChatGPT in particular) and comprises three modules: profile, memory (inspired from cognitive neuroscience), and action. Two main questions need to be considered when leveraging LLMs: (1) what behavior to simulate and (2) how to design prompts.
- Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models, Wang et al., EMNLP 2023
- TL;DR: Proposes an interactive evaluation approach, iEvaLM, using LLM-based user simulators. The approach is validated through experimentation on two public datasets (ReDial and OpenDialKG). The simulated user is given a persona based on preferences established from ground truth items, and the allowed behaviors, i.e., taking about a preference, providing feedback, and completing the conversation, are defined in the prompt. The evaluation considers two types of metric that are objective with recall and subjective with persuasiveness (which is scored using an LLM). The authors also mention some of the limitations of this approach, mostly related to the LLM.
- A Multi-Task Based Neural Model to Simulate Users in Goal Oriented Dialogue Systems, Kim and Lipani, SIGIR 2022
- TL;DR: Proposes a user simulator, based on a generative model, that predicts users’ satisfaction scores, actions, and utterances in a multi-task learning setting. The authors perform an ablation study to show that the three tasks help each other to better simulate users. Note that the proposed user simulator does not represent users’ knowledge and mental status.