Data Preparation Tips and Tricks for Machine Learning

Data Preparation Tips and Tricks for Machine Learning

Trelis Research via YouTube Direct link

Fine-web

2 of 16

2 of 16

Fine-web

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Data Preparation Tips and Tricks for Machine Learning

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Welcome
  2. 2 Fine-web
  3. 3 Clustering and balancing data - Meta Paper
  4. 4 Clustering analysis in Colab
  5. 5 How to prepare chat / Q&A datasets synthetically
  6. 6 Q&A
  7. 7 Handling labeled data for fine-tuning
  8. 8 Setting a chat template for a tokenizer without one
  9. 9 Considerations on novel data and hallucinations
  10. 10 Issues with tokenizer and chat template not aligning
  11. 11 Using mixed-language datasets and their impact on training
  12. 12 Recommendations for models suitable for text classification
  13. 13 Extracting structured data from PDFs and tables
  14. 14 Multi-GPU training considerations
  15. 15 Using the LLM to VEC method for embeddings
  16. 16 Rag pipeline suggestions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.