Overview
Discover how PHP plays a crucial role in handling Twitter's massive data stream at DataSift in this PHP UK Conference talk. Learn about the architecture and processes involved in managing the 'firehose' of 500 million daily tweets, including data scraping, language detection, and delivery. Explore the reasons behind choosing PHP for this high-scale operation, its performance advantages, and how it compares to other languages. Gain insights into PHP's string handling capabilities, JSON decoding behavior, and bundled extensions that make it suitable for processing large volumes of data. Understand the philosophy behind PHP and its reliability in production environments. The presentation concludes with a Q&A session, offering a comprehensive look at PHP's capabilities in handling big data at scale.
Syllabus
Introduction
Scraping
Firehose
Architecture diagram
Goblin
PHP
ETL
URL
Language Detection
Architecture
Supervisor
Manager
Pipelines
Delivery
Kafka
Push Scheduler
Load ETL
More than one firehose
Scale up
Where PHP fits
Summary
History of PHP
Why PHP
PHP was no risk
How well PHP works
Other languages
NoJS
JSONDecode
Behavior
Scale
String handling
Encoding
JVM
Connectivity
Bundled Extensions
Liability in Production
Quality Threshold
PHP Philosophy
Slide Summary
Recap
QA
Taught by
PHP UK Conference