Overview
Learn advanced web scraping techniques in this conference talk from PHP UK 2022. Discover tips and tricks for extracting information from websites and mobile apps without relying on clean, well-documented APIs. Explore topics such as headless browsers, performance optimization, edge cases, and browser differences. Follow along with practical demonstrations covering phone networks, local calling guides, and insurance details. Gain insights into handling OAuth headers, parsing URLs, and making efficient single requests. Delve into advanced concepts like browser fingerprinting, anti-scraping measures, mobile API interactions, OCR, and PDF encryption. Enhance your web scraping skills to tackle complex data extraction challenges effectively.
Syllabus
Introduction
Overview
Getting Data
Headless Browser
Performance
Edge Cases
Firefox vs Chrome
Demo 1 Phone Networks
Demo 2 Local Calling Guide
Demo 2 Edge Cases
Demo 3 Insurance Details
Oauth Headers
Parse URL
Single Request
JSON Back
Recap
Using a real browser
Browser fingerprinting
Antiscraping
Mobile API
OCR
PDF Encryption
Questions
Taught by
PHP UK Conference