Large-Scale Data Extraction, Structuring and Matching Using Python and Spark

Overview

Explore large-scale data extraction, structuring, and matching techniques using Python and Apache Spark in this EuroPython 2017 conference talk. Learn how to tackle challenges in big data environments, including unzipping compressed archives, extracting relevant files, and extracting metadata from XML and PDF files. Discover methods for matching meta-information from different data collections, with a focus on scientific publications and user profiles across various repositories and platforms. Gain insights into the solution process for handling large-scale unzipping, file extraction from archives, and metadata extraction for performing matches, all within a big data context.

Syllabus

Deep Kayal - Large-scale data extraction, structuring and matching using Python and Spark

Taught by

EuroPython Conference

Reviews

Start your review of Large-Scale Data Extraction, Structuring and Matching Using Python and Spark

Taught by

Scalable Machine Learning on Big Data using Apache Spark

High Volume PDF Text Extraction Using Python Open-Source Tools

Large Scale Data Validation - with Spark and Dask

Apache Spark for Machine Learning on Large Data Sets

Visualization and Analysis of Large Scale Datasets with Python

Visualization and Analysis of Large Scale Datasets with Python

10 Best Python Courses for 2024: Charming the Snake

Never Stop Learning.