Description

Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue on the same theme, dealing more specifically with the issues of massive data and the complex calculations associated with them. These two MOOCs complement each other and offer a coherent training program on the subject.

In this second MOOC, we will show you how to improve your practices for managing large data and complex computations in controlled software environments:

you will learn how to use formats like JSON, FITS, and HDF5, platforms like Zenodo and Software Heritage, tools like git-annex, docker, singularity, guix, make, and snakemake;
we will show you how to integrate them in a real-life use case: a sunspot detection study. You will see for yourself that our methods and tools allow you to work in a reliable and reproducible way.

The strength of this new MOOC lies in a general and systematic presentation of the major concepts and of how they translate into practical solutions through numerous hands-on sessions with state-of-the-art open-source tools.

Syllabus

Plan de cours

Module 1: Managing data

1.1 Archiving
1.2 File formats
1.3 Project Organization
1.4 Git Annex

Module2: Managing software

2.1 On the Importance of Software Environment
2.2 Package Management Principles
2.3 Isolation and Containers
2.4 Using Containers
2.5 Building and Sharing Containers
2.6 Functional Package Managers (Guix, Docker, Singularity...)

Module 3: Managing computations

3.1 Why do we need workflows?
3.2 From notebooks to shell scripts
3.3 Workflows with `make`
3.4 Workflows with `snakemake`
3.5 Workflows and environments

Reviews

Start your review of Reproducible Research II: Practices and tools for managing computations and data

Description

Plan de cours

Tags

Never Stop Learning.