Building a Knowledge Graph with Spark and NLP for Novel Drug Recommendations

Overview

Discover how to build a knowledge graph using Spark and NLP to recommend novel drugs to scientists. Learn about AstraZeneca's "5R" framework and its impact on improving efficiency in drug discovery. Explore the challenges of parsing large amounts of information from various formats and data models, and how to formulate drug target finding as a hybrid recommendation problem. Delve into the process of assembling a large-scale knowledge graph from public and internal data, focusing on NLP techniques to extract precise information at scale. Gain insights into graph embedding pipelines, approximate nearest neighbor search, and valuable lessons learned in the field of drug discovery and recommendation systems.

Syllabus

Intro
Drug discovery is hard
AstraZeneca introduced the "5R" framework
5R has had a significant impact in improving our efficiency
We are investing in new sources of data and faster validation
We need tools to make sense of data & make better and faster decisions
Finding a drug target can be formulated as a hybrid recommendation problem • Scientists need to parse large amount of information and make a ranking prediction • Different formats, data models, locations
Multiple objective optimization
Traditional recsys approaches
We assemble a large scale knowledge graph from public and AZ internal data
KG pipeline on
Pipeline - series of notebooks
Pipeline stages
Node dictionary
Mappings table
Edge assertions
Keep evidence & context for each assertion
Focus on NLP
Use natural language processing to extract precise information at scale
NLP Termite on Spark
Syntax parsing increases precision of entity recognition
Relationship from literatures reduce sparsity of biological KG
Language models lead to improvements in recall and precision
Learned sentence representation can be used for downstream tasks
Graph embedding pipeline
Approximate nearest neighbor search
Lessons learned
Acknowledgements

Taught by

Databricks

Reviews

Start your review of Building a Knowledge Graph with Spark and NLP for Novel Drug Recommendations

Taught by

Drug and Vaccine Discovery Using Knowledge Graphs and Apache Spark

10 Best Data Science Courses

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.