NoSQL, Big Data, and Spark Foundations

IBM via Coursera Specialization

Go to class Write review

Details

Go to class

Provider

Coursera Specialization
Pricing

Paid Course
Languages

English
Certificate

Certificate Available
Duration & workload

17 weeks, 3 hours a week
Level

Beginner

Found in

Overview

Big Data Engineers and professionals with NoSQL skills are highly sought after in the data management industry. This Specialization is designed for those seeking to develop fundamental skills for working with Big Data, Apache Spark, and NoSQL databases. Three information-packed courses cover popular NoSQL databases like MongoDB and Apache Cassandra, the widely used Apache Hadoop ecosystem of Big Data tools, as well as Apache Spark analytics engine for large-scale data processing. You start with an overview of various categories of NoSQL (Not only SQL) data repositories, and then work hands-on with several of them including IBM Cloudant, MonogoDB and Cassandra. You’ll perform various data management tasks, such as creating & replicating databases, inserting, updating, deleting, querying, indexing, aggregating & sharding data. Next, you’ll gain fundamental knowledge of Big Data technologies such as Hadoop, MapReduce, HDFS, Hive, and HBase, followed by a more in depth working knowledge of Apache Spark, Spark Dataframes, Spark SQL, PySpark, the Spark Application UI, and scaling Spark with Kubernetes. In the final course, you will learn to work with Spark Structured Streaming Spark ML - for performing Extract, Transform and Load processing (ETL) and machine learning tasks. This specialization is suitable for beginners in the fields of NoSQL and Big Data – whether you are or preparing to be a Data Engineer, Software Developer, IT Architect, Data Scientist, or IT Manager.

Syllabus

Course 1: Introduction to NoSQL Databases
- Offered by IBM. This course will provide you with technical hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) ... Enroll for free.

Course 2: Introduction to Big Data with Spark and Hadoop
- Offered by IBM. Bernard Marr defines Big Data as the digital trace that we are generating in this digital era. In this course, you will ... Enroll for free.

Course 3: Data Engineering and Machine Learning using Spark
- Offered by IBM. Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to ... Enroll for free.

Courses

0 reviews
18 hours 25 minutes
View details

Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained a lot of relevance in the database landscape. Their main advantage is the ability to handle scalability and flexibility issues modern applications raise. You will start this course by learning the history and the basics of NoSQL databases (document, key-value, column, and graph) and discover their key characteristics and benefits. You will learn about the four categories of NoSQL databases and how they differ. You’ll also explore the differences between the ACID and BASE consistency models, the pros and cons of distributed systems, and when to use RDBMS and NoSQL. You will also learn about vector databases, an emerging class of databases popular in AI. Next, you will explore the architecture and features of several implementations of NoSQL databases, namely MongoDB, Cassandra, and IBM Cloudant. You will learn about the common tasks that they each perform and their key and defining characteristics. You will then get hands-on experience using those NoSQL databases to perform standard database management tasks, such as creating and replicating databases, loading and querying data, modifying database permissions, indexing and aggregating data, and sharding (or partitioning) data. At the end of this course, you will complete a final project where you will apply all your knowledge of the course content to a specific scenario and work with several NoSQL databases. This course suits anyone wanting to expand their Data Management and Information Technology skill set.
1 review
7-8 hours
View details

NOTE: This course is currently replaced with IBM Machine Learning with Apache Spark.

Further your data engineering career with this self-paced course about machine learning with Apache Spark! Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery and more to identify behaviors and preferences of prospects, clients, competitors, and others.

In this short course you'll gain these practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering.

In this course you will learn about data sources, streaming output modes, and supported data destinations. You will gain insights about the advantages of Apache Spark GraphFrames and complete a number of hands-on labs to apply your knowledge.

You will then move on to learning about machine learning using SparkML, the Spark Machine Learning library. You will gain an understanding of both supervised and unsupervised machine learning, classification and regression tasks, as well as clustering.

The course ends with a final project where you will create your own Apache Spark application for performing Extract, Transform, and Load (ETL) processes.

NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks. The Introduction to Big Data with Spark and Hadoop course from IBM will equip you with these skills and it is recommended that you have completed that course or have skills similar to the ones learnt in that course.
0 reviews
19 hours 31 minutes
View details

This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases. You’ll also explore how big data uses technologies like parallel processing, scaling, and data parallelism. Next, you will learn about Hadoop, an open-source framework that allows for the distributed processing of large data and its ecosystem. You will discover important applications that go hand in hand with Hadoop, like Distributed File System (HDFS), MapReduce, and HBase. You will become familiar with Hive, a data warehouse software that provides an SQL-like interface to efficiently query and manipulate large data sets. You’ll then gain insights into Apache Spark, an open-source processing engine that provides users with new ways to store and use big data. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the components that make up Apache Spark. You’ll learn about DataFrames and perform basic DataFrame operations and work with SparkSQL. Explore how Spark processes and monitors the requests your application submits and how you can track work using the Spark Application UI. This course has several hands-on labs to help you apply and practice the concepts you learn. You will complete Hadoop and Spark labs using various tools and technologies, including Docker, Kubernetes, Python, and Jupyter Notebooks.