近年来,人工智能技术渗透到行业的不同领域迅速增加。因此,大数据系统是实现当今数据驱动AI的基础,因此变得至关重要。本课程致力于引导学生学习大数据系统的基本概念,涵盖数据如何有效存储,处理和分析。我们从分布式系统设计的一般原则出发;然后我们提供有关如何在大数据系统中扩展存储,计算和网络功能的框架;最后,为了使这些设计原则易于遵循,我们的案例研究使用真实的工业系统来演示基本设计原则如何应用于实际系统以及如何分析其性能和限制。
Overview
Syllabus
- Chapter1: Introduction to Big Data Systems
- Section1: What is big data and what is big data system?
- Section2: Problems in big data systems?
- Section3: Overview of the course
- Section4: Principles of big data system design
- Chapter2: Basics of Linux Data Processing
- Section1: Manipulating Data on Linux
- Section2: Running Commands on a Single Machine
- Section3: Using a Linux Cluster
- Chapter3: Distributed File System
- Section1: Storage for Big Data Computing: Distributed file system
- Section2: File system and GFS
- Section3: Understanding HDFS using Legos
- Section4: File System Implementation and DFS
- Chapter4: MapReduce
- Section1: What is MapReduce and why
- Section2: Learn MapReduce by playing with cards
- Section3: Processing pattern
- Section4: Hadoop
- Section5: Algorithms in MapReduce
- Section6: Tutorial
- Chapter5: In-memory Processing
- Section1: Background
- Section2: Spark
- Section3: Use Spark for data mining
- Section4: Spark data processing
- Section5: Experiment in Spark
- Chapter6: Streaming Data Processing
- Section1: Introduction to streaming data processing
- Section2: Storm
- Section3: Spark streaming
- Chapter7: NoSQL
- Section1: NoSQL introduction
- Section2: Common Advantages
- Section3: Bigtable
- Section4: Master Startup
- Section5: HBase
- Chapter8: Graph Processing
- Section1: What is GraphDB and Graph data processing
- Section2: Graph systems
- Section3: Example of a GraphDB
- Chapter9: Machine Learning System
- Section1: Mahout
- Section2: Case Study: Recommendation
- Section3: Recommendation in Mahout
- 期末考试
Taught by
Zhi Wang