Completed
Considerations on novel data and hallucinations
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Data Preparation Tips and Tricks for Machine Learning
Automatically move to the next video in the Classroom when playback concludes
- 1 Welcome
- 2 Fine-web
- 3 Clustering and balancing data - Meta Paper
- 4 Clustering analysis in Colab
- 5 How to prepare chat / Q&A datasets synthetically
- 6 Q&A
- 7 Handling labeled data for fine-tuning
- 8 Setting a chat template for a tokenizer without one
- 9 Considerations on novel data and hallucinations
- 10 Issues with tokenizer and chat template not aligning
- 11 Using mixed-language datasets and their impact on training
- 12 Recommendations for models suitable for text classification
- 13 Extracting structured data from PDFs and tables
- 14 Multi-GPU training considerations
- 15 Using the LLM to VEC method for embeddings
- 16 Rag pipeline suggestions