Building an LLM Fine-Tuning Dataset - From Reddit Comments to QLoRA Training

Building an LLM Fine-Tuning Dataset - From Reddit Comments to QLoRA Training

sentdex via YouTube Direct link

- Decompressing all of the gzip archives

4 of 9

4 of 9

- Decompressing all of the gzip archives

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Building an LLM Fine-Tuning Dataset - From Reddit Comments to QLoRA Training

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Introduction to Dataset building for fine-tuning.
  2. 2 - The Reddit dataset options Torrent, Archive.org, BigQuery
  3. 3 - Exporting BigQuery Reddit and some other data
  4. 4 - Decompressing all of the gzip archives
  5. 5 - Re-combining the archives for target subreddits
  6. 6 - How to structure the data
  7. 7 - Building training samples and saving to database
  8. 8 - Creating customized training json files
  9. 9 - QLoRA training and results

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.