Low Latency Neural Network Inference for ML Ranking Applications - Yelp Case Study

Overview

Explore how Yelp overhauled its ML Platform to support low-latency neural network inference for ranking applications in this 39-minute conference talk. Gain insights into the architectural overview of Yelp's ML Platform and learn how they integrated capabilities to train and deploy Tensorflow-based models using MLEAP, cataloging them in MLFlow. Discover the transition from using Elasticsearch to Yelp's own near-real-time search (Nrtsearch) open-source framework for model deployment. Delve into the challenges faced regarding latency and model performance, including the incorporation of embedded features. Benefit from the expertise of Ryan Irwin, Engineering Manager, and Rajvinder Singh, Sr Product Manager at Yelp Inc., as they share their experiences in streamlining support for XGboost and LR models built in Spark for various business applications, and expanding to support neural network models for photo classification and popular dish identification.