The course was created with the support of Sberbank
This is an unconventional course in modern Data Analysis, Machine Learning and Data Mining. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. The term summarization embraces here both simple summaries like totals and means and more complex summaries: the principal components of a set of features and cluster structures in a set of entities. Similarly, correlation covers both bivariate and multivariate relations between input and target features including Bayes classifiers.
The view of the data as a subject of computational data analysis that is adhered to here has emerged quite recently. Typically, in sciences and in statistics, a problem comes first, and then the investigator turns to data that might be useful in advancing towards a solution. Yet nowadays the situation is reversed frequently, especially with the advent of Big Data. Typical questions then are: Take a look at this data set - what sense can be made out of it? – Is there any structure in the data set? Can these features help in predicting those? This is more reminiscent to a traveler’s view of the world rather than that of a scientist. The scientist sits at his desk, gets reproducible signals from the universe and tries to accommodate them into a great model of the universe. The traveler deals with what come on their way – here is the data analysis niche. A textbook by the instructor along these lines has been published by Springer-London in 2011: “Core concepts in data analysis is clean and devoid of any fuzziness. The author presents his theses with a refreshing clarity seldom seen in a text of this sophistication. … To single out just one of the text’s many successes: I doubt readers will ever encounter again such a detailed and excellent treatment of correlation concepts. (Computing Reviews of ACM, June 2011).”