Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Johns Hopkins University

Introduction to Reproducibility in Cancer Informatics

Johns Hopkins University via Coursera

Overview

The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research and have not had training in reproducibility tools and methods. This course is written for individuals who: - Have some familiarity with R or Python - have written some scripts. - Have not had formal training in computational methods. - Have limited or no familiar with GitHub, Docker, or package management tools. Motivation Data analyses are generally not reproducible without direct contact with the original researchers and a substantial amount of time and effort (BeaulieuJones et al, 2017). Reproducibility in cancer informatics (as with other fields) is still not monitored or incentivized despite that it is fundamental to the scientific method. Despite the lack of incentive, many researchers strive for reproducibility in their own work but often lack the skills or training to do so effectively. Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. Reproducible analyses are more likely to be understood, applied, and replicated by others. This helps expedite the scientific process by helping researchers avoid false positive dead ends. Open source clarity in reproducible methods also saves researchers' time so they don't have to reinvent the proverbial wheel for methods that everyone in the field is already performing. Curriculum This course introduces the concepts of reproducibility and replicability in the context of cancer informatics. It uses hands-on exercises to demonstrate in practical terms how to increase the reproducibility of data analyses. The course also introduces tools relevant to reproducibility including analysis notebooks, package managers, git and GitHub. The course includes hands-on exercises for how to apply reproducible code concepts to their code. Individuals who take this course are encouraged to complete these activities as they follow along with the course material to help increase the reproducibility of their analyses. **Goal of this course:** Equip learners with reproducibility skills they can apply to their existing analyses scripts and projects. This course opts for an "ease into it" approach. We attempt to give learners doable, incremental steps to increase the reproducibility of their analyses. **What is not the goal** This course is meant to introduce learners to the reproducibility tools, but _it does not necessarily represent the absolute end-all, be-all best practices for the use of these tools_. In other words, this course gives a starting point with these tools, but not an ending point. The advanced version of this course is the next step toward incrementally "better practices". How to use the course This course is designed with busy professional learners in mind -- who may have to pick up and put down the course when their schedule allows. Each exercise has the option for you to continue along with the example files as you've been editing them in each chapter, OR you can download fresh chapter files that have been edited in accordance with the relative part of the course. This way, if you decide to skip a chapter or find that your own files you've been working on no longer make sense, you have a fresh starting point at each exercise.

Syllabus

  • Introduction to this Course
    • In this first section, we will discuss the goals of this course and define what we mean by reproducibility.
  • Organizing your project
    • In this section we discuss motivation and strategies for project organization.
  • Using notebooks
    • In this section we discuss the motivation for using notebooks and integrated development environments to enhance the reproducibility of your project.
  • Making your project open source with GitHub
    • In this section we will describe how GitHub can make a project open source and encourage reproducibility.
  • Managing package versions
    • In this section we discuss two strategies for managing package versions in a project.
  • Writing durable code
    • In this section we discuss aspects of code that can make it more durable to enhance the reproducibility of a project.
  • Code review
    • This section discusses the importance of code review for creating reproducible analyses.
  • Documenting analysis
    • This section discusses how to document analyses to enhance their reproducibility.

Taught by

Candace Savonen, MS

Reviews

Start your review of Introduction to Reproducibility in Cancer Informatics

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.