Since the rise of deep learning in 2012, much progress has been made in deep-learning-based AI tasks such as image/video understanding and natural language understanding, as well as GPU/accelerator architectures that greatly improve the training and inference speed for neural-network models. As the industry players race to develop ambitious applications such as self-driving vehicles, cashier-less supermarkets, human-level interactive robot systems, and human intelligence augmentation, major research challenges remain in computational methods as well as hardware/software infrastructures required for these applications to be effective, robust, responsive, accountable and cost-effective. Innovations in scalable iterative solvers and graph algorithms will be needed to achieve these application-level goals but will also impose much higher-level of data storage capacity, access latency, energy efficiency, and processing throughput. In this talk, Wen-mei Hwu presents on recent progress in building highly performant AI task libraries, creating full AI applications, providing AI application development tools, and prototyping the Erudite system at the IBM-Illinois C3SR.
Stanford Seminar - Erudite: Prototype System for Computational Intelligence
Stanford University via YouTube
Erudite: A Low-Latency, High-Capacity, and High- efficiency System for Computational Intelligence.
C3SR Core Faculty.
Al Application Pipeline Example - Watson Jeopardy 2011.
Automatic Generation of Sports Highlight and Analytics.
Automatic Conference Reviewer Assignment.
C3SR Al Task Libraries.
Person Parsing.
Example Application DL Inference Flow in the Cloud.
Hardware Comparison - Same Model and Framework.
Importance of Model Data Loading in DL Inference.
Hardware for Watson Jeopardy! 2011.
FlatFlash-Storage-class Memory.
FlatFlash Architecture.
Example: Performance Benefit for Graph Computation.
A Simplified View of IBM Newell with NVIDIA Volta GPUs.
Starting Point - Data Access Challenge (HBM).
Starting Point - Data Access Challenge (DDR).
Iterative Solver Example- If matrix fits into Host Memory.
Triangle Counting Example.
MCN Near-Memory Acceleration for Existing Scalable Applications performing computation near data.
Comparison Against a Traditional SPARC Cluster.
Erudite Step 1.
Taught by
Stanford Online