Showing 18 of 18 papers
WSDM 2024
Authors
Abbasiantaeb et al.

Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions

Summary

Proposes a simulation framework involving two large language models (LLMs) acting as a questioner and an answerer engaged in a conversation. The main objectives are to investigate the effectiveness of LLMs in simulating question-answering conversations and to compare the generated conversations against human-human conversations with regards to various characteristics. The analysis shows that the LLMs tend to generate longer questions and answers than humans, and these provide better coverage of the topic in focus.

qageneration
CUI 2024
Authors
Bernard and Balog

Identifying Breakdowns in Conversational Recommender Systems using User Simulation

Summary

Proposes a methodology to systematically test conversational recommender systems with regards to conversational breakdowns. It consists of analyzing conversations generated between the system and a user simulator to identify pre-defined types of breakdowns. A case study demonstrates that the methodology can be applied to make an existing conversational recommender system more robust to conversation breakdowns.

recommendationevaluation
ICTIR 2024
Authors
Bernard and Balog

Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access

Summary

Formally characterizes the distinct objectives of user simulators: (1) training aims to maximize behavioral similarity to real users and (2) evaluation focuses on the accurate prediction of real-world conversational agent performance. An empirical study shows that optimizing for one objective does not necessarily lead to improved performance on the other. This finding highlights the need for distinct design considerations during the development of user simulators.

ciatrainingevaluation
UM-CIR 2024
Authors
Fu et al.

An Evaluation Framework for Conversational Information Retrieval Using User Simulation

Summary

Proposes a new user simulator prototype to perform simulation-based evaluation of CIR systems. The user simulator comprises two modules: (1) action predictor and (2) response generator. The action predictor determines the next action based on the context, the actions available depends on the dataset used. The response generator uses the conversational context, user profile, and previously predicted action to output a realistic and personalised response. The assessment of success is based on the study of stopping strategies.

searchevaluation
arXiv 2024
Authors
Huang et al.

Concept–An Evaluation Protocol on Conversation Recommender Systems with System-and User-centric Factors

Summary

Proposes a new evaluation protocol for conversational recommendation systems that considers both system- and user-centric factors that influence the user experience and engagement. The protocol identifies and defines six abilities that relate to three factors, in addition to corresponding metrics. Some metrics are computed based on scores given by a large language model. The authors apply the protocol to evaluate off-the-self conversational recommender systems and demonstrate its comprehensiveness.

recommendationevaluation
arXiv 2024
Authors
Vlachou and Macdonald

What Else Would I Like? A User Simulator using Alternatives for Improved Evaluation of Fashion Conversational Recommendation Systems

Summary

Proposes a meta user simulator that can provide knowledge on alternative targets in the context of conversational recommendations in fashion. Based on a patience parameter, the target item is replaced by the closest alternative (i.e., the one with the highest visual similarity). The experiments show that it leads to shorter conversations as users are inclined to change their minds and accept an alternative target; another positive consequence is an improved success rate of the conversational recommender system.

recommendationevaluation
NAACL 2024
Authors
Yoon et al.

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Summary

Proposes a new protocol to evaluate LLM-based user simulators for conversational recommendation scenario. The protocol comprises five evaluation tasks: choosing items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendation, and giving feedback. The objective of these tasks is to discover distorsion between simulators and human behaviors. The experiments show that LLM-based simulators exhibit differences with humans such as low diversity in items discussed, low correlation with the representation/expression of preferences, a lack of personalization, and incoherent feedback.

recommendationevaluation
arXiv 2024
Authors
Zhu et al.

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation

Summary

Performs an analysis of the limitations of a LLM-based user simulator, iEvaLM, and proposes a new user simulator to mitigate the identified limitations. The analysis of iEvaLM with two datasets reveals that (1) it leads to data leakage which inflate the performances, (2) the simulated responses are not the main factor in getting successful recommendations, and (3) controlling the simulated responses via a single prompt is complex. To address these limitations, the authors propose SimpleUserSim which does not know the name of the target item during the conversation and uses a different prompt for each possible user action. Using the same experimental setting, they show that SimpleUserSim is less prone to data leakage and produces more impactful responses.

recommendationevaluation
arXiv 2024
Authors
Zhu et al.

A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems

Summary

Introduces a LLM-based user simulator that is controllable, scalable, and allows human intervention during user profile creation. The dialogue generation process is divided into stages: profile initialization, preference initialization, and message handling. The user behavior for each stage is controlled with specific plugins such as user preferences summary and intent understanding. These plugins can easily be modified, extended, or replaced to change the behavior of the user simulator. The authors perform experiments in two scenarios, with/out human annotation in the dataset, to showcase the adaptability of the user simulator and its ability to effectively simulate user preferences.

recommendationgeneration
arXiv 2023
Authors
Davidson et al.

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

Summary

Proposes a user simulator built with an LLM using in-context learning instead of fine-tuning. The main objective is to generate linguistically diverse and human-like utterances. The use of goal success rate as a metric to evaluate user simulator is criticized as humans tend to have non-optimal behavior.

task-orientedevaluation
CIKM 2023
Authors
Hu et al.

Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulators to Enhance Dialogue System

Summary

Proposes a new optimization approach that leverages simulated user satisfaction from a large language model to enhance task-oriented dialogue systems. It integrated simulated user satisfaction into the reward function of proximal policy optimization used to optimize a fine-tuned task-oriented dialogue system. Empirical experiments with fine-tuned Flan-T5 (dialogue system) and ChatGPT (user simulator) on two benchmark datasets show the potential of the proposed approach in a case where user satisfaction annotations are not available.

task-orientedtraining
ACL 2023
Authors
Liu et al.

One Cannot Stand for Everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue Systems

Summary

Proposes to train task-oriented dialogue systems using multiple user simulators. They define the problem as a multi-armed bandit where each arm corresponds to one user simulator. It allows to balance how much each simulator is used during optimisation steps and tackle catastrophic forgetting. The experimental results show that the performances are improved compared to baseline agents trained with a single user simulator in a single domain scenario; the agents trained with their framework are more robust to unseen domains. While the results are promising, they admit that experiments in a multi-domain scenario are needed.

task-orientedtraining
SIGIR 2023
Authors
Owoicho et al.

Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond

Summary

Proposes a user simulator able to answer clarifying questions and give direct feedback to a conversational search system. This simulator is initialized with a given information need and can interact with conversational search systems over multiple turns while using natural language and staying coherent. It also integrates the notion of patience, and will stop a conversation when patience runs out. Crowd workers assess the quality of the generated answers by the simulator w.r.t. naturalness and usefulness when used to perform experiments with the TREC CAsT dataset. The experiments show the benefits of using simulated user feedback to improve conversational search systems.

searchevaluation
arXiv 2023
Authors
Sun et al.

Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems

Summary

Presents a metaphorical user simulator, MetaSim, that uses historical conversation strategies as metaphors for a current conversation. It improves the simulator's abilities at dialogue reasoning and generalizing to new domains, and its realism. The authors also propose a tester-based evaluation framework to evaluate user simulators and task-oriented dialogue systems; a manual evaluation shows that it is a promising solution for automatic evaluation.

task-orientedevaluation
arXiv 2023
Authors
Terragni et al.

In-Context Learning User Simulators for Task-Oriented Dialog Systems

Summary

Proposes an approach to build an in-context learning user simulator using a LLM. The user simulator is given a prompt comprising the task description, example dialogues, user goal, and dialogue history to generate responses. It comprises an evaluation component that tracks the goal completion and assesses the system's actions. The experiments show that the in-context learning abilities of LLMs are valuable to generate diverse dialogues (exploration of many dialogue paths) but suffer from limitations like unpredictability and hallucinations.

task-orientedgeneration
arXiv 2023
Authors
Wang et al.

User Behavior Simulation with Large Language Model based Agents

Summary

Introduces a simulation environment where agents can interact with a recommender system, other agents, and "social media". An agent is based on a LLM (ChatGPT in particular) and comprises three modules: profile, memory (inspired from cognitive neuroscience), and action. Two main questions need to be considered when leveraging LLMs: (1) what behavior to simulate and (2) how to design prompts.

recommendationtrainingevaluation
EMNLP 2023
Authors
Wang et al.

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

Summary

Proposes an interactive evaluation approach, iEvaLM, using LLM-based user simulators. The approach is validated through experimentation on two public datasets (ReDial and OpenDialKG). The simulated user is given a persona based on preferences established from ground truth items, and the allowed behaviors, i.e., taking about a preference, providing feedback, and completing the conversation, are defined in the prompt. The evaluation considers two types of metric that are objective with recall and subjective with persuasiveness (which is scored using an LLM). The authors also mention some of the limitations of this approach, mostly related to the LLM.

recommendationevaluation
SIGIR 2022
Authors
Kim and Lipani

A Multi-Task Based Neural Model to Simulate Users in Goal Oriented Dialogue Systems

Summary

Proposes a user simulator, based on a generative model, that predicts users' satisfaction scores, actions, and utterances in a multi-task learning setting. The authors perform an ablation study to show that the three tasks help each other to better simulate users. Note that the proposed user simulator does not represent users' knowledge and mental status.

task-oriented

This page is maintained by Nolwenn Bernard and Krisztian Balog.

We welcome suggestions via email.