Explore the challenges and solutions for processing irregularly shaped datasets in this 34-minute conference talk from Strange Loop. Learn how particle physicists tackle the problem of analyzing data with varying numbers of particles per collision, resulting in jagged or ragged arrays. Discover the development of awkward-array, a layer over Numpy that extends array programming to handle complex, nested data structures. Gain insights into how this approach can benefit fields beyond physics, such as genomics and log file analysis. Understand the potential for vectorization and integration with tools like Apache Arrow, Parquet, Numba, and Pandas. Presented by Jim Pivarski, a computational physicist from Princeton University with experience in particle physics and data science.
Overview
Syllabus
"Jagged, ragged, awkward arrays" by Jim Pivarski
Taught by
Strange Loop Conference