Konferenzpaper · 2024
Evaluation Framework for Large Language Model-based Conversational Agents
In Pacific-Asia Conference on Information Systems PACIS 2024 Proceedings. Ho Chi Minh City, Vietnam, July 1 - 5, 2024
Abstract
The integration of Large Language Models (LLM) in Conversational Agents (CA) enables a significant advancement in the agents’ ability to understand and respond to user queries in a more human-like manner. Despite the widespread adoption of LLMs in these agents, there exists a noticeable lack of research on standardized evaluation methods. Addressing this research gap, our study proposes a comprehensive evaluation framework tailored explicitly to LLM-based conversational agents. In a Design Science Research (DSR) project, we construct an evaluation framework that incorporates four essential components: the pre-defined objectives of the agents, corresponding tasks, and the selection of appropriate datasets and metrics. Our framework outlines how these elements relate to each other in the evaluation and enables a structured approach for the evaluation. We demonstrate how such a framework enables a more systematic evaluation process. This framework can be a guiding tool for researchers and developers working with LLM-based conversational agents.
Cite As
Wolters, A., Arz von Straussenburg, A., & Riehle, D. (2024). Evaluation Framework for Large Language Model-based Conversational Agents. In T. Q. Phan, B. Tan, L. Hoanh-Su, & N. H. Thuan (Eds.), Pacific-Asia Conference on Information Systems PACIS 2024 Proceedings. Ho Chi Minh City, Vietnam, July 1 - 5, 2024 (pp. 1390–1406). Association for Information Systems/AIS eLibrary. https://aisel.aisnet.org/pacis2024/track01_aibussoc/track01_aibussoc/14
BibTeX
@inproceedings{Wolters2024Evaluation,
address = {Atlanta, GA},
author = {Wolters, Anna and Arz von Straussenburg, Arnold and Riehle, Dennis},
booktitle = {Pacific-{Asia} {Conference} on {Information} {Systems} {PACIS} 2024 {Proceedings}. {Ho} {Chi} {Minh} {City}, {Vietnam}, {July} 1 - 5, 2024},
editor = {Phan, Tuan Q. and Tan, Bernard and Hoanh-Su, Le and Thuan, Nguyen Hoang},
year = {2024},
pages = {1390--1406},
organization = {Association for Information Systems/AIS eLibrary},
title = {Evaluation {Framework} for {Large} {Language} {Model}-based {Conversational} {Agents}},
url = {https://aisel.aisnet.org/pacis2024/track01_aibussoc/track01_aibussoc/14},
}