Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

IGNOU

MCS-226: Data Science and Big Data

IGNOU via Swayam

Overview

Data Science and Big Data – MCS -226. The course is one of the courses of the Master of Computer Applications programme of IGNOU. This course introduces the students to the concepts of data science and big data, its architecture and a programming technique R that can be used to analyse big data.1. Basics of Data ScienceIntroduction to Data Science: Probability: Conditional Probability and Bayes Theorem, Random Variables and Basic Distributions: Binomial Distribution, Probability Distribution of Continuous Random Variable & The Normal Distribution, Sampling Distribution and the Central Limit Theorem, Statistical Hypothesis Testing: Estimation of Parameters of the Population, Significance Testing of Statistical Hypothesis, Example Using Correlation and Regression & Types of Errors in Hypothesis Testing.Data Preparation for Analysis: Need for Data Preparation, Data Preprocessing: Data Cleaning, Data Integration, Data Reduction & Data Transformation, Selection and Data Extraction, Data Curation: Steps of Data Curation, Importance of Data Curation, Data Integration: Data Integration Techniques & Data Integration Approaches, Knowledge Discovery.Analysis of Simple Algorithm: Complexity analysis of Algorithms, Euclid Algorithm for GCD, Polynomial Evaluation Algorithm, Exponent Evaluation, Sorting AlgorithmData Visualization and Interpretation: Different types of plots, Histograms, Box plots, Scatter plots, Heatmap, Bubble Chart, Bar Chart, Distribution plot, Pair plot, Line graph, Pie Chart, Doughnut Chart, Area Chart.2. Big Data and its ManagementBig Architecture: Big Data and Characteristics and Big Data Applications, Big data Applications, Structured vs Semi-structured and Unstructured Data, Big Data vs Data Warehouse, Distributed File System, HDFS and Map Reduce, Apache Hadoop 1 and 2 (YARN).Programming Using MapReduce: Map Reduce Operations, Loading data into HDFS: Installing Hadoop and Loading Data, Executing the MapReduce phases: Execution of Map Phase, Shuffling and sorting, Reduce phase execution & Node Failure and MapReduce, Algorithms using MapReduce: Word counting and Matrix-Vector Multiplication.Other Big data Architectures and Tools: Apache SPARK framework, HIVE: Working of HIVE Queries, Installation of HIVE and Writing Queries in HIVE, HBase: HBase Installation & Working with HBase, Other tools .3. Big Data AnalysisMining Big Data: Finding Similar Items: Jaccard Similarity of Sets, Documents Similarity & Collaborative Filtering and Set Similarity, Finding Similar Documents: Shingles, Minhashing & Locality Sensitive Hashing, Distance Measures: Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance & Hamming Distance, Introduction to Other Techniques.Mining Data Streams: Data Streams: Model for Data Stream Processing, Data Stream Management: Queries of Data stream,Example of Data Stream & Issues and Challenges of Data Stream, Data Sampling in Data Streams: The Representation Sample, Filtering of Data Streams: Bloom Filter, Algorithm to Count Different Elements in Stream.Link Analysis: Link analysis, Page Rankin, Different Mechanisms of Finding PageRank: Finding PageRank & Web Structure and Associated Issues, Use of PageRank in Search Engines: Page Rank computation using Map-reduce, Topic sensitive PageRank, Link Spam, Hubs and Authorities.Web and Social Network Analysis: Introduction to Web Analytics, Advertising on the Web: Issues & Algorithm, Recommendation Systems: The Long Tail, The Model and Content-Based Recommendations, Mining Social Networks: Social Networks as Graphs, Varieties of Social Networks, Distance Measure of Social Network Graphs & Clustering of Social Network Graphs.4. Programming for Data AnalysisBasics of R Programming: Environment of R, Data types, Variables, Operators, Factors, Decision Making, Loops, Functions, Data Structures in R: Strings and Vectors, Lists & Matrices, Arrays and Frames.Data Interfacing and Visualisation in R: Reading Data From Files: CSV File, Excel File, Binary Files, XML Files, JSON Files, interfacing with Databases & Web Data, Data Cleaning and Pre-processing, Visualisations in R: Bar Charts, Box Plots, Histograms, Line Graphs & Scatterplots.Data Analysis and R: Chi-Square Test, Linear Regression, Multiple Regression, Logistic Regression, Time Series Analysis.Advance Analysis Using R: Decision Trees, Random Forest, Classification, Clustering, Association Rules.

Syllabus

WEEK

TOPIC

Week-1

Introduction to Data Science

Week-2

Probability and Statistics for Data Science

Week-3

Data Preparation for Analysis

Week-4

Data Visualization and Interpretation

Week-5

Big Architecture

Week-6

Programming Using MapReduce

Week-7

Other Big data Architectures and Tools

Week-8

NoSQL Database


Week-9

Mining Big Data

Mining Data Streams


Week-10

Link Analysis

Web and Social Network Analysis


Week-11

Basic of R Programming

Data Interfacing and Visualisation in R


Week-12

Data Analysis and R

Advance Analysis Using R

Taught by

Prof. Sandeep Singh Rawat

Tags

Reviews

Start your review of MCS-226: Data Science and Big Data

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.