Frontera - Open Source Large-Scale Web Crawling Framework

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the open-source Frontera framework for large-scale web crawling in this EuroPython 2015 conference talk. Discover how to build real-time distributed web crawlers and website-focused ones using Frontera's customizable URL metadata storage, crawling strategies management, and transport layer abstraction. Learn about integrating Frontera with Scrapy, Kafka, and HBase to create a powerful distributed crawler. Gain insights into the framework's architecture, features, and use cases, including a demonstration of collecting statistics from the Spanish internet. Understand the motivation behind Frontera, its single-threaded and real-time capabilities, and future development plans. Perfect for developers interested in advanced web crawling techniques and large-scale data collection.

Syllabus

About me
What is Frontera
What is Terra
Motivation
Single threaded
Single integration
Real time
Unique content
Metadata storage
Architecture
Scrapping
Simple spider
Use cases
Architecture distributed
Features
Requirements
Quick start
Spanish crawl
Future plans
Questions

Taught by

EuroPython Conference

Reviews

Start your review of Frontera - Open Source Large-Scale Web Crawling Framework

Taught by

Never Stop Learning.