Completed
- Introduction to multimodal audio plus text models
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Multimodal Audio and Text Fine-tuning with Qwen - Implementation Guide
Automatically move to the next video in the Classroom when playback concludes
- 1 - Introduction to multimodal audio plus text models
- 2 - Overview of Qwen 2 Audio model capabilities and applications
- 3 - Technical details of the Qwen 2 Audio model architecture
- 4 - Benefits of integrated multimodal model vs separate models
- 5 - Applications and use cases
- 6 - Key advantages of the integrated model
- 7 - Specific applications:
- 8 - Introduction to LoRA fine-tuning approach
- 9 - Google Colab implementation walkthrough
- 10 - Model loading and configuration
- 11 - Testing audio processing capabilities
- 12 - Audio input examples and testing
- 13 - Dataset preparation for fine-tuning
- 14 - Detailed data collation process
- 15 - Processing audio and text inputs
- 16 - Setting up the data collator
- 17 - Training configuration and LoRA setup
- 18 - Training process and hyperparameters
- 19 - VLLM inference setup
- 20 - Production deployment considerations
- 21 - Fine-tuning results and analysis
- 22 - Conclusion and summary