TelBench: Development and Evaluation of Benchmarks for Measuring Telco Service LLM Performance

Overview

Learn about the development and evaluation of TelBench, a specialized benchmark for measuring Large Language Model (LLM) performance in telecommunications services, in this 24-minute conference talk from SK AI SUMMIT 2024. Discover how TelTask and TelInstruct learning datasets were designed to enable LLMs to understand telecommunications terminology, knowledge, and business context. Explore the benchmarking process and results used to validate Telco LLM, including collaborative synergies between linguists and engineers, professional evaluations by customer service representatives, and the development of telecommunications-specific LLM-as-a-judge. Gain insights into the potential of large language models within the telecommunications industry through the experiences shared by Sunwoo Lee, SK Telecom's Data Construction/Evaluation Team Leader, who combines linguistic expertise with NLP implementation to design training data, evaluate model performance, and drive service applications.