Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure

Valence Labs via YouTube

Overview

Explore a comprehensive lecture on protein machine learning representations, focusing on the joint distribution of sequence and structure. Dive into the analysis of ESMFold embeddings, uncovering massive activations and their implications. Learn about continuous compression schemes that significantly reduce ESMFold embeddings while maintaining structural information and performance on protein function benchmarks. Discover a novel tokenized all-atom structure vocabulary that enables high reconstruction accuracy from sequence alone. Examine the CHEAP (Compressed Hourglass Embedding Adaptations of Proteins) embeddings and the HPCT (Hourglass Protein Compression Transformer) architecture, understanding their potential for compact representation of protein structure and sequence. Gain insights into information content asymmetries between sequence and structure, and explore the democratization of representations captured by large models. Investigate the flexible downstream applications of CHEAP embeddings, including generation, search, and prediction. The lecture concludes with a Q&A session, providing an opportunity to delve deeper into this cutting-edge research in protein machine learning.

Syllabus

- Introduction
- Background
CHEAP
Q&A

Taught by

Valence Labs

Reviews

Start your review of Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.