Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Fine-tuning a large language model (LLM) is crucial for aligning it with specific business needs, enhancing accuracy, and optimizing its performance. In turn, this gives businesses precise, actionable insights that drive efficiency and innovation. This course gives aspiring gen AI engineers valuable fine-tuning skills employers are actively seeking.
During this course, you’ll explore different approaches to fine-tuning and causal LLMs with human feedback and direct preference. You’ll look at LLMs as policies for probability distributions for generating responses and the concepts of instruction-tuning with Hugging Face. You’ll learn to calculate rewards using human feedback and reward modeling with Hugging Face. Plus, you’ll explore reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO) and PPO Trainer, and optimal solutions for direct preference optimization (DPO) problems.
As you learn, you’ll get valuable hands-on experience in online labs where you’ll work on reward modeling, PPO, and DPO.
If you’re looking to add in-demand capabilities in fine-tuning LLMs to your resume, ENROLL TODAY and build the job-ready skills employers are looking for in just two weeks!