Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions of building billion-node graphs for machine learning in this Scala Days 2023 Seattle conference talk. Dive into the world of Graph Machine Learning (GraphML) and discover how to handle internet-scale data for AI research. Learn about transforming raw data into graph form, executing GraphML algorithms at scale, and creating scalable solutions for production workloads. Gain insights into the limitations of existing tools like graph databases and Spark, and understand the engineering challenges involved in working with massive graphs. Discover techniques for data ingestion, graph building, management, and storage when dealing with billions of nodes and edges. Follow the journey of building a toolkit that enables researchers to define and experiment with different graphs and algorithms efficiently. Uncover the power of representing complex relationships in graph form and how it applies to various domains, including ad tech and internet traffic analysis.
Syllabus
Introduction
Rorys background
AD Tech
Ad Targeting
The AI Lab
Building a graph
Design constraints
Tools
Bring it all together
Data cleaning
Performance
Notebooks
Spark
Overengineering
Framework
Notebooks Spark
Optimus Cirrus
Research Papers
Implementing Research Papers
Whats Wrong with Version 1
How Much Space
Memory Requirements
Loading Data Computer Fast
Distributed Systems
Worker Engines
Silver Message
Message Passing
Current Status
Conclusions
Build what you need
Power and storage
Taught by
Scala Days Conferences