Large-Scale Data Extraction, Structuring and Matching Using Python and Spark
EuroPython Conference via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore large-scale data extraction, structuring, and matching techniques using Python and Apache Spark in this EuroPython 2017 conference talk. Learn how to tackle challenges in big data environments, including unzipping compressed archives, extracting relevant files, and extracting metadata from XML and PDF files. Discover methods for matching meta-information from different data collections, with a focus on scientific publications and user profiles across various repositories and platforms. Gain insights into the solution process for handling large-scale unzipping, file extraction from archives, and metadata extraction for performing matches, all within a big data context.
Syllabus
Deep Kayal - Large-scale data extraction, structuring and matching using Python and Spark
Taught by
EuroPython Conference