Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Nanjing University

数据科学理论与应用

Nanjing University via XuetangX

Overview

  本课程是人文社会科学专业本科生数据科学与数据分析训练的系列课程之一。本门课程中,避免了以算法为中心的训练,选择从数据科学基本概念与原则入手。通过理解原理来思考数据分析目的,以此培养学生分析数据科学问题、评估数据科学解决方案以及数据科学战略评价等综合能力。课程涵盖的主题包括数据科学项目生命周期、探索性数据分析、数据可视化技术、模型构建与模型拟合以及模型评估等内容。

  本课程采用全英文授课,是一门国际化课程,旨在让人文社会科学领域的学生掌握数据科学的基本理论与知识,形成数据科学思维,掌握基础的数据处理技能,能够开展基本的数据分析和数据可视化工作。本课程坚持以数据为中心、以问题为导向、以培养数据科学思维为重点、理论与实践相结合以及兼顾数据管理与数据分析的教学理念与原则,通力培养学生形成问题意识、数据科学思维以及分析问题与解决问题的综合能力。本课程采用多元化的教学方法与教学手段。宏观上,采用课堂教学打牢基础、实践操作巩固能力、兼顾知识讲授与引导等方法与手段来提升教与学水平;微观上,采用案例分析法、比较法与中心发散思维方法等方法进行具体知识点的讲授。

This course focus on data science training for business practitioners and students in social science. We deliberately avoid algorithm-centered training in this course and emphasize studying how to apply some tools to solve practical problems. We will learn a relatively set of fundamental concepts or principles that underlie techniques for extracting useful knowledge from data. These concepts underlie the analysis of data-centered business problems, the reaction and evaluation of data science solutions, and the evaluation of general data science strategies and proposals. Enriching knowledge of data visualization, regression, classification, cluster, and data-analytic thinking is the thesis of this course.

Syllabus

  • 1. Data Science Introduction
    • 1.1 Data Science Introduction I-What is Data Science
    • 1.2 Data Science Introduction II- Data Science Product
    • 1.3 Data-Driven Decision Making
    • 1.4 Data Science is Interdisciplinary
    • 1.5 Course Design
  • 2. Data Types
    • 2.1 Data Files and Data Types
    • 2.2 Rectangular Data and Nonrectangular Data
    • 2.3 Statistical Estimation in EDA
    • 2.4 Exploring numerical data
    • 2.5 Exploring categorical data
  • 3. R introduction
    • 3.1 R introduction
    • 3.2 data structure
    • 3.3 Dplyr Package
    • 3.4 Tidyr Package
    • 3.5 Data Processing
    • 3.6 Data Transformation Tutorial
  • 4. Data Visualization
    • 4.1 Data Visualization
    • 4.2 Basics of The ggplot2 Package
    • 4.3 Object and 7 Layers
    • 4.4 ggplot2 practical operation-revised
  • 5. Regression
    • 5.1 Definition of Linear Regression
    • 5.2 Ordinal Least Square Method
    • 5.3 Assumptions of OLS
    • 5.4 Tests and Intervals
    • 5.5 Multiple Linear Regressionn
    • 5.6 Model Evaluation
    • 5.7 Regression Tutorial
  • 6. Classification Algorithm
    • 6.1.1 Basics of Classification
    • 6.1.2 Basics of Classification
    • 6.2 Logistic Regression
    • 6.3 Decision Tree
    • 6.4 Naive Bayes
    • 6.5 K-Nearest Neighbors
    • 6.6 classification practical operation
  • 7. Clustering Algorithm
    • 7.1 Introduction 2.0
    • 7.2 Basics of Cluster 1.0
    • 7.3 Prototype-Based Clustering K-Means 1.0
    • 7.4 Density-Based Clustering DBSCAN and OPTICS 1.0
    • 7.5 Hierarchical Clustering Agnes and Diana 1.0
    • 7.6 Collaborative Filtering 1.0
    • 7.7 Cluster Model Tutorial
  • Final Exam

    Taught by

    Lele Kang, Lei Pei, and Si Chen

    Tags

    Reviews

    Start your review of 数据科学理论与应用

    Never Stop Learning.

    Get personalized course recommendations, track subjects and courses with reminders, and more.

    Someone learning on their laptop while sitting on the floor.