Overview
Syllabus
- Introduction to multimodal audio plus text models
- Overview of Qwen 2 Audio model capabilities and applications
- Technical details of the Qwen 2 Audio model architecture
- Benefits of integrated multimodal model vs separate models
- Applications and use cases
- Key advantages of the integrated model
- Specific applications:
- Introduction to LoRA fine-tuning approach
- Google Colab implementation walkthrough
- Model loading and configuration
- Testing audio processing capabilities
- Audio input examples and testing
- Dataset preparation for fine-tuning
- Detailed data collation process
- Processing audio and text inputs
- Setting up the data collator
- Training configuration and LoRA setup
- Training process and hyperparameters
- VLLM inference setup
- Production deployment considerations
- Fine-tuning results and analysis
- Conclusion and summary
Taught by
Trelis Research