Introduction to data analysis
Saint Petersburg State University via Coursera
-
81
-
- Write review
This course may be unavailable.
Overview
Class Central Tips
With this course, you will begin to take the first steps in the world of data analysis. You will see in detail the main concepts and processes that make up this discipline. The main goal of the course is acquisition of knowledge about the mathematical and statistical basics underlying the main ideas and approaches used in data science. This is achieved through setting and solving typical tasks, which a researcher in the field of data science can face in his work. You will get practical skills in working with data analysis tools used in different spheres of human activity. You will be acquainted with the main tasks, methods and basic algorithms, as well as with the spheres of their practical applications. You will know how applied problems of data processing and analysis are being solved. You will be acquainted with the main concepts of artificial neural networks and the ways they are being trained.
Syllabus
- Data and Big Data Analysis: Approaches, Functions and Software Tools
- The 1-st module explores the concept of data analysis and introduces
basic techniques of this analysis. It discusses the concept of big data
and its possible applications. It also considers the relationship between
different approaches to process data as well as basic software for data
analysis. Some useful functions for data analysis are presented. The
principles of big data processing are discussed, in particular the
MapReduce model. - Basic Characteristics of Data. Distributions, Statistics and Regressions
- In Module 2, descriptive statistics and exploratory data analysis are
discussed. The main characteristics of data distributions are introduced
and their calculations are presented in some examples. Frequency and
Bayesian approaches to hypothesis testing are explained. The basic concepts
of regression and correlation analysis are formulated, focusing on linear
analysis methods.
- Clustering and Dimensionality Reduction
- Module 3 discusses the clustering problem and the algorithms for solving it.
Hierarchical clustering, k-means algorithm and CURE-algorithm are explained.
Peculiarities of the algorithms operation in non-Euclidean space are specified.
The module also covers some questions of dimensionality reduction, the basic
facts of singular value decomposition, and illustrates its applications.
It also considers the principal component analysis and CUR-decomposition,
applicable for big data processing. - Machine Learning and Artificial Neural Networks
- Module 4 discusses models and methods of machine learning. The model of the
perceptron, its functioning, advantages and disadvantages are discussed in
detail. The basic support vector machine and its generalizations are
considered. Further it discusses artificial neural networks, their
organization and training. The main features of deep neural networks,
problems that appear with such networks and modern methods to overcome
these problems are discussed. The convolutional and recurrent neural networks
are also considered.
Taught by
Григорьев Юрий Александрович, Руднев Владимир Александрович, Андронов Иван Викторович, Яревский Евгений Александрович and Яковлев Сергей Леонидович