By the end of this second course in the Total Data Quality Specialization, learners will be able to:
1. Learn various metrics for evaluating Total Data Quality (TDQ) at each stage of the TDQ framework.
2. Create a quality concept map that tracks relevant aspects of TDQ from a particular application or data source.
3. Think through relative trade-offs between quality aspects, relative costs and practical constraints imposed by a particular project or study.
4. Identify relevant software and related tools for computing the various metrics.
5. Understand metrics that can be computed for both designed and found/organic data.
6. Apply the metrics to real data and interpret their resulting values from a TDQ perspective.
This specialization as a whole aims to explore the Total Data Quality framework in depth and provide learners with more information about the detailed evaluation of total data quality that needs to happen prior to data analysis. The goal is for learners to incorporate evaluations of data quality into their process as a critical component for all projects. We sincerely hope to disseminate knowledge about total data quality to all learners, such as data scientists and quantitative analysts, who have not had sufficient training in the initial steps of the data science process that focus on data collection and evaluation of data quality. We feel that extensive knowledge of data science techniques and statistical analysis procedures will not help a quantitative research study if the data collected/gathered are not of sufficiently high quality.
This specialization will focus on the essential first steps in any type of scientific investigation using data: either generating or gathering data, understanding where the data come from, evaluating the quality of the data, and taking steps to maximize the quality of the data prior to performing any kind of statistical analysis or applying data science techniques to answer research questions. Given this focus, there will be little material on the analysis of data, which is covered in myriad existing Coursera specializations. The primary focus of this specialization will be on understanding and maximizing data quality prior to analysis.
Overview
Syllabus
- Introduction and Measuring Validity and Data Origin Quality
- Welcome to Measuring Total Data Quality! This is the second course in the Total Data Quality Specialization. After reviewing the Course 2 syllabus and completing the course pre-survey, you’ll learn how to measure validity for designed and gathered data through a series of video lectures, examples, and readings. You’ll then take a short quiz on interpreting validity metrics. Then, you’ll complete a module on data origin, where you’ll learn about measuring data origin quality for designed and gathered data in a series of video lectures and case studies. Week 1 will conclude with a quiz on interpreting data origin quality metrics.
- Measuring Processing and Data Access Quality
- Welcome to Week 2 of Measuring Total Data Quality! We’ll begin the week by discussing how to measure processing data quality for designed and gathered data. We’ll include examples of measuring process data quality for each form of data and conclude the module with a quiz on interpreting processing metrics. In the second half of Week 2, we’ll discuss measuring data access quality for designed and gathered data through video lectures, an example, and a case study, and conclude the week with a quiz on interpreting access metrics.
- Measuring Data Source Quality and Data Missingness
- This week, we’ll learn how to measure data source quality and data missingness. We’ll begin Week 3 with a video lecture on measuring data source quality for designed data. Then, we’ll work through an example of computing data source metrics with real data and code. We’ll then learn how to measure data source quality for gathered data and see an example of computer data source quality metrics with real data and code. You’ll then take a short quiz on interpreting data source quality metrics and move on to the Data Missingness unit. We’ll learn how to measure threats to data source quality for designed and gathered data and work through examples for each form of data. Week 3 will conclude with a quiz on interpreting data missingness metrics.
- Measuring the Quality of Data Analysis
- We’ll be wrapping up Measuring Total Data Quality this week by learning how to measure the quality of data analysis. We’ll learn how to measure the quality of data analysis for designed and gathered data and work through examples of each type of data. We recommend that you complete two readings before you complete the lecture on measuring the quality of analysis for gathered data. We will conclude the week with a quiz on examining quality metrics and interpreting output, as well as references for the Measuring Total Data Quality course and a course post-survey.
Taught by
Brady T. West, James Wagner, Jinseok Kim and Trent D Buskirk