Konferenzpaper · 2024

Evaluation Framework for Large Language Model-based Conversational Agents

Anna Wolters, Arnold Arz von Straussenburg, Dennis Riehle

In Pacific-Asia Conference on Information Systems PACIS 2024 Proceedings. Ho Chi Minh City, Vietnam, July 1 - 5, 2024

Abstract

<p>The integration of Large Language Models (LLM) in Conversational Agents (CA) enables a significant advancement in the agents’ ability to understand and respond to user queries in a more human-like manner. Despite the widespread adoption of LLMs in these agents, there exists a noticeable lack of research on standardized evaluation methods. Addressing this research gap, our study proposes a comprehensive evaluation framework tailored explicitly to LLM-based conversational agents. In a Design Science Research (DSR) project, we construct an evaluation framework that incorporates four essential components: the pre-defined objectives of the agents, corresponding tasks, and the selection of appropriate datasets and metrics. Our framework outlines how these elements relate to each other in the evaluation and enables a structured approach for the evaluation. We demonstrate how such a framework enables a more systematic evaluation process. This framework can be a guiding tool for researchers and developers working with LLM-based conversational agents.</p>

Cite As

Wolters, A., Arz von Straussenburg, A., & Riehle, D. (2024). Evaluation Framework for Large Language Model-based Conversational Agents. In T. Q. Phan, B. Tan, L. Hoanh-Su, & N. H. Thuan (Eds.), Pacific-Asia Conference on Information Systems PACIS 2024 Proceedings. Ho Chi Minh City, Vietnam, July 1 - 5, 2024 (pp. 1390–1406). Association for Information Systems/AIS eLibrary. https://aisel.aisnet.org/pacis2024/track01_aibussoc/track01_aibussoc/14

BibTeX

@inproceedings{Wolters2024Evaluation,
	address = {Atlanta, GA},
	author = {Wolters, Anna and Arz von Straussenburg, Arnold and Riehle, Dennis},
	booktitle = {Pacific-{Asia} {Conference} on {Information} {Systems} {PACIS} 2024 {Proceedings}. {Ho} {Chi} {Minh} {City}, {Vietnam}, {July} 1 - 5, 2024},
	editor = {Phan, Tuan Q. and Tan, Bernard and Hoanh-Su, Le and Thuan, Nguyen Hoang},
	year = {2024},
	pages = {1390--1406},
	organization = {Association for Information Systems/AIS eLibrary},
	title = {Evaluation {Framework} for {Large} {Language} {Model}-based {Conversational} {Agents}},
	url = {https://aisel.aisnet.org/pacis2024/track01_aibussoc/track01_aibussoc/14},
}

Publications

Evaluation Framework for Large Language Model-based Conversational Agents

Abstract

Cite As

BibTeX