Explore the complexities of geospatial data analysis in Apache Spark through this informative 36-minute talk. Delve into the challenges of working with geospatial data in Spark, including projections, geometry types, indices, and storage issues. Learn about various geospatial packages available for Spark, their pros and cons, and best practices for data ingestion and long-term storage. Gain insights into spatial indexing for rapid record retrieval and discover approaches to limit errors and reduce costs when handling large-scale geospatial data. Follow along with a demonstration covering Geo JSON files, spatial disaggregation, and practical code examples to enhance your understanding of geospatial options in Apache Spark.
Overview
Syllabus
Introduction
About PNNL
Disclaimer
Challenges
Projections
Index
Finding and curating data
System libraries
Largescale joins
Demo
Steps
Geo JSON Files
Spatial De disaggregation
Code Demo
Conclusion
Taught by
Databricks