Overview
Syllabus
Welcome! Prabhat's part of the talk.
Supercomputers at National Energy Research Supercomputing Center (NERSC).
Top 10 data analytics problems (link to blog post).
Celeste is at top of the list for many reasons.
Cori Phase 2, Cray XC-40, the supercomputer that we use.
NERSC big data stack.
Performance and productivity tradeoff?.
Celeste team accomplishment.
The Celest.jl collaboration.
Project goals. Jeff Regier's part of the talk.
Examples of astronomical images that we want to analyze.
Challenge of analyzing faint stars and galaxies.
Our approach using Bayesian inference.
The Celest.jl graphical model.
Scientific color priors.
The likelihood: p(images|catalog).
Tractable and intractable quantities in Bayesian inference.
Problem with integral over too high dimensional space.
Variational inference.
Julia makes implementation of complicated functions possible.
Results for small data.
Validation of our methods against well-researched sky region Stripe 82.
Numerical optimization scheme and making it parallel.
Cori Phase 2 supercomputer, some technical information.
Performance results of Celest.jl.
Solving large-scale I/O problem. Keno Fischer part of the talk.
Problem that we encounter with I/O.
We take down the network with data traffic that we created.
Standard high-performance guidelines Julia code.
Majority of hot code in Celest.jl compute Hessian matrix of the objective function.
Analyzing for loops in the code.
Question: what is ILP (Instruction Level Parallelism).
How much ILP is required?.
Putting a large chunk of Julia and assemble code in one big scalar inner loop and vectoring it.
Improving Julia and LLVM to allow the aforementioned vectorization.
Improvements from the previous point are slowly moving into Julia proper.
Using StaticArrays.jl.
Optimization of memory layout to avoid vector shuffling.
Using multiple dispatch to utilize Hessian matrix structure.
Optimization takeaways.
Conclusions.
Closing information.
Taught by
The Julia Programming Language