From Zero to GenAI: 9 Unique Ways to Understand Large Language Models
A human neural network trained on Anime subtitles was used to generate this article.
Large Language Models, GenAI, Transformers, Embeddings, Vectors, Inferences, Fine-tuning, RAG, Neural Networks, Megaloboxing… Do I have your Attention?
Ever wanted to understand how Generative AI works, not to build it, but at least get the gist?
As someone with premium subscriptions to three different LLM services, so did I. For months, I’ve been safely keeping track of these resources by leaving them open in tabs across various devices and browsers. One day, I hope to actually finish at least one of them…
At Class Central, we generally recommend courses. But often, learning isn’t just about pre-recorded videos with an occasional quiz here and there, followed by a paid certificate for clicking the right things.
- What Is ChatGPT Doing … and Why Does It Work? By Stephen Wolfram
- 3Blue1Brown’s Visual Intro to Transformers and Attention
- LLM University And Serrano.Academy
- Jay Alammar’s Visual Journey Through Language Models
- Let’s build GPT: from scratch, in code, spelled out by Andrej Karpathy
- GPT in 60 Lines of NumPy by Jay Mody
- Spreadsheets-are-all-you-need.ai
- GPT in 500 lines of SQL
- Scrimba’s Intro to Mistral AI
Why Learn About LLMs or Large Language Models?
Honestly, IDK. I’m just adding this question to please the algorithm gods at Google. Feel free to skip the rest of this section. The next paragraph is entirely generated by an LLM, which will remain anonymous to protect its privacy.
Learning about Large Language Models is important because they’re changing how we communicate and access information. By understanding how LLMs work, you can better use them to improve your writing, communication, and even your job skills!
GenAI vs GPT vs LLM
You know the drill. Feel free to skip the rest of the section or just paste the title into your friendly neighborhood LLM.
- GenAI: A general term for artificial intelligence that can learn and improve over time.
- GPT (Generative Pre-trained Transformer): A specific type of AI model that can generate human-like text, like chatbots and language translation.
- LLM (Large Language Model): A type of AI model that’s trained on a massive amount of text data to understand and generate human language, like writing and conversation.
What Is ChatGPT Doing … and Why Does It Work? By Stephen Wolfram
Yes, THE Stephen Wolfram. Don’t know who he is? Here is what Wikipedia ChatGPT has to say about him:
Stephen Wolfram is a British-American computer scientist, physicist, and entrepreneur, best known for his work in developing Mathematica, an advanced computational software, and for his development of the Wolfram Alpha computational knowledge engine.
This nearly 20,000-word article (book?) by Wolfram goes into detail about the Wolfram Language and GPT-2 system, with illustrations and code examples provided. It’s in-depth but accessible (given the complexity of the topic).
tldr: It’s just adding one word at a time.
This is art. It would be great if we had more educators like Wolfram who can break down complex topics into something that a majority of people can understand, building it up bit by bit.
Of course, we’re not going to do that because it doesn’t “scale.” Instead, we’re going to flood the internet with random garbage generated by LLMs.
If you have to read just one article, this would be it..
3Blue1Brown’s Visual Intro to Transformers and Attention
3Blue1Brown (3b1b) by Grant Sanderson is a popular YouTube channel with over 6 million subscribers. He creates stunning animated videos of complex mathematical concepts, making them accessible and visually engaging for viewers of all levels.
Sanderson created his own mathematical animation engine and open-sourced it on GitHub. Similar to Wolfram, I feel his videos are a work of art. I don’t think he’s worried about GenAI taking his job.
So far, he has published two videos on this topic: “But what is a GPT? Visual intro to transformers” and “Attention in transformers, visually explained.” I even spied a third video on his Patreon.
He explains visually what goes on in a transformer step-by-step. And by step-by-step, I mean he uses a real-world example and shows us the actual matrices in those steps as data flows through them. We’re talking tokens, vectors, attention blocks, and feed-forward layers – all brought to life through Sanderson’s magical animations.
It’s mind-boggling that this even exists. I haven’t been this impressed with a video since I watched Jurassic Park in theaters for the first time (I know I’m dating myself).
LLM University And Serrano.Academy
I am combining LLM University by Cohere and Serrano.Academy (YouTube Channel) because they have a common instructor: Luis Serrano.
If this name sounds familiar, you might be a Udacitian from its heyday. Luis was a popular instructor teaching the Machine Learning Nanodegree. Long ago, Udacity launched something called Udacity Connect Intensive. Basically, you’d meet in-person once a week in a physical classroom while taking the Nanodegree.
I was part of the first cohort/test in San Jose, and Luis Serrano once dropped in to give a lecture. His greatest strength is breaking down complicated concepts into simple analogies and examples.
For me, Luis provides the intuition behind the concepts. His passion for teaching is obvious and infectious.
Unlike the previous two examples that start with real-world examples and are quite information packed, I would say if you’re having trouble understanding those two resources, watch the following Serrano.Academy videos:
- The Attention Mechanism in Large Language Models
- The math behind Attention: Keys, Queries, and Values matrices
- What are Transformer Models and how do they work?
LLM University consists of 7 modules in total and contains a mix of text and videos. The first module, taught by Luis, covers Large Language Models and some of the theory behind them like Attention, Transformers, and Embeddings. I believe this would have significant overlap with the Serrano.Academy videos I mentioned above.
The next 6 modules are very practical in nature and deal with the real-world applications of LLMs: Text Generation, Semantic Search, and Retrieval-Augmented Generation (RAG). The code examples are in Python and use the Cohere SDK.
There is also a section on Prompt Engineering.
Jay Alammar’s Visual Journey Through Language Models
Jay Alammar, another former Udacity instructor, now works at Cohere alongside Luis Serrano. He is also an instructor for certain modules at Cohere’s LLM University.
Alammar has created a series of tutorials where he explains the workings of large language models using illustrations, animations, and visualizations:
- Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
- The Illustrated Transformer
- The Illustrated GPT-2
- How GPT3 Works – Visualizations and Animations
These tutorials offer a visual approach to understanding complex AI concepts, making them more accessible to a wider audience.
Let’s build GPT: from scratch, in code, spelled out by Andrej Karpathy
In this two hour video, you build a simplified version of ChatGPT from scratch with one of the co-founders of OpenAI, Andrej Karpathy. Karpathy previously was also a Director of AI at Tesla, where he led the development of the company’s Autopilot system.
Honestly, the title of the video is pretty self explanatory. You will build a transformer model right from scratch in Python. It will focus on training a character-level language model on the Shakespeare dataset to generate text that resembles Shakespeare’s writing.
This is part of his “Neural Networks: Zero to Hero” series. Since this video, he has published a couple more: “Let’s build the GPT Tokenizer” and most recently a 4-hour “Let’s reproduce GPT-2 (124M)“. Apparently it takes 90 minutes and $20 to build a model that was released by Open AI in 2019.
For his latest video, he has turned on the YouTube Super Thanks feature, which means people can show their gratitude by giving donations.
Quite a drastic change for someone who is used to profiting off other people’s original content to now cosplaying as an original creator.
GPT in 60 Lines of NumPy by Jay Mody
Now we are getting into some of the weird and impractical implementations of GPT.
In this text-based tutorial, Jay Mody goes step by step and builds different parts of a GPT using only 60 lines of NumPy. He calls it picoGPT.
To keep the size “pico,” there is no training involved. Instead, the trained GPT-2 model weights released by OpenAI are used. To honor Mody-ji, I will keep the description limited.
Spreadsheets-are-all-you-need.ai
GPT-2 model weights are back, but this time in an Excel spreadsheet! Yes, Ishan Anand implemented the model inside a spreadsheet. It’s only a 1.25GB spreadsheet, if you dare to try and open it on your machine.
But you don’t need to download the spreadsheet; you can watch the 10-minute video “Lesson 1: Demystifying GPT with Excel.”
The video explains GPT-2 in a way that’s easy to understand, even for people who are not familiar with artificial intelligence (AI). The complex math behind GPT-2 is broken down into steps that can be followed in a spreadsheet. Even though it’s in a spreadsheet, the model can generate text, just like other large language models.
This was supposed to be a video series where each “GPT-2 Phase” would be explained in detail. So far, only “Lesson 2: Byte Pair Encoding & Tokenization” has been released. But the creator seems to be taking a hand in monetizing by launching a $399 cohort-based course on Maven.
GPT in 500 lines of SQL
If you can implement a large language model in NumPy and spreadsheets, why not SQL?
Inspired by Jay Mody’s “GPT in 60 lines of NumPy,” Alex Bolenok (who goes by Quassnoi) asked himself if it could be possible to implement a large language model in SQL. Well, first he asked ChatGPT, which said that it is not possible to do so, and then he decided to do it himself.
Apparently, Quassnoi’s favorite programming language is SQL. He has even solved the Rubik’s Cube in SQL.
This is his key thesis for implementing GPT-2 in SQL: “This function is deterministic. It does a lot of math under the hood, but all this math is hardwired. If you call it repeatedly with the same input, it will always return the same output.”
At the end of the article, you will see a 500-line hellish SQL statement. A simple prompt that generates 10 new tokens took 2 minutes and 44 seconds on his laptop.
Scrimba’s Intro to Mistral AI
Unlike other resources, this is a more traditional course by Scrimba’s co-founder Per Borgen. And unlike others, it is also one that I finished!
Though the title of the course refers to Mistral AI, the choice of specific LLM is not that important. This extremely well-done and to-the-point course introduces a number of different concepts in practice and gives a glimpse of how you could use them in your apps. I will let my Class Central review speak for the course:
While simple, it brings together a number of different concepts, such as RAG, Vectors, Embeddings, and Function Calling, into a single course with working code examples in JavaScript that you can experiment with directly in your browser.
What’s Next?
I don’t know 🤷
Sujal Ramesh Chinchvalkar
Superb!!!!!