Performance Evaluation of Open-Source Instruction-Tuned Large Language Models
Discover AI via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn about the latest developments in open-source instruction-tuned Large Language Models (LLMs) in this comprehensive video presentation that analyzes performance benchmarks and evaluation methodologies. Explore key findings from a recent arXiv pre-print titled "INSTRUCTEVAL" which provides a holistic evaluation framework for instruction-tuned LLMs. Compare results across three major leaderboards from Stanford's HELM, HuggingFace, and LMsys to understand how different open-source models perform. Delve into topics including evaluation data, problem-solving capabilities, human values alignment, and practical implications for AI development. Gain insights into benchmark methodologies and discover which open-source LLMs are currently leading in performance across various metrics and use cases.
Syllabus
Introduction
Evaluation Data
Problem Solving
Main Message
Human Values
Conclusion
Bonus
Helm
Taught by
Discover AI