Overview
Big Data Engineers and professionals with NoSQL skills are highly sought after in the data management industry. This Specialization is designed for those seeking to develop fundamental skills for working with Big Data, Apache Spark, and NoSQL databases. Three information-packed courses cover popular NoSQL databases like MongoDB and Apache Cassandra, the widely used Apache Hadoop ecosystem of Big Data tools, as well as Apache Spark analytics engine for large-scale data processing.
You start with an overview of various categories of NoSQL (Not only SQL) data repositories, and then work hands-on with several of them including IBM Cloudant, MonogoDB and Cassandra. You’ll perform various data management tasks, such as creating & replicating databases, inserting, updating, deleting, querying, indexing, aggregating & sharding data. Next, you’ll gain fundamental knowledge of Big Data technologies such as Hadoop, MapReduce, HDFS, Hive, and HBase, followed by a more in depth working knowledge of Apache Spark, Spark Dataframes, Spark SQL, PySpark, the Spark Application UI, and scaling Spark with Kubernetes. In the final course, you will learn to work with Spark Structured Streaming Spark ML - for performing Extract, Transform and Load processing (ETL) and machine learning tasks.
This specialization is suitable for beginners in the fields of NoSQL and Big Data – whether you are or preparing to be a Data Engineer, Software Developer, IT Architect, Data Scientist, or IT Manager.
Syllabus
Course 1: Introduction to NoSQL Databases
- Offered by IBM. Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, ... Enroll for free.
Course 2: Introduction to Big Data with Spark and Hadoop
- Offered by IBM. This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data ... Enroll for free.
Course 3: Machine Learning with Apache Spark
- Offered by IBM. Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking ... Enroll for free.
- Offered by IBM. Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, ... Enroll for free.
Course 2: Introduction to Big Data with Spark and Hadoop
- Offered by IBM. This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data ... Enroll for free.
Course 3: Machine Learning with Apache Spark
- Offered by IBM. Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking ... Enroll for free.
Courses
-
Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained a lot of relevance in the database landscape. Their main advantage is the ability to handle scalability and flexibility issues modern applications raise. You will start this course by learning the history and the basics of NoSQL databases (document, key-value, column, and graph) and discover their key characteristics and benefits. You will learn about the four categories of NoSQL databases and how they differ. You’ll also explore the differences between the ACID and BASE consistency models, the pros and cons of distributed systems, and when to use RDBMS and NoSQL. You will also learn about vector databases, an emerging class of databases popular in AI. Next, you will explore the architecture and features of several implementations of NoSQL databases, namely MongoDB, Cassandra, and IBM Cloudant. You will learn about the common tasks that they each perform and their key and defining characteristics. You will then get hands-on experience using those NoSQL databases to perform standard database management tasks, such as creating and replicating databases, loading and querying data, modifying database permissions, indexing and aggregating data, and sharding (or partitioning) data. At the end of this course, you will complete a final project where you will apply all your knowledge of the course content to a specific scenario and work with several NoSQL databases. This course suits anyone wanting to expand their Data Management and Information Technology skill set.
-
This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases. You’ll also explore how big data uses technologies like parallel processing, scaling, and data parallelism. Next, you will learn about Hadoop, an open-source framework that allows for the distributed processing of large data and its ecosystem. You will discover important applications that go hand in hand with Hadoop, like Distributed File System (HDFS), MapReduce, and HBase. You will become familiar with Hive, a data warehouse software that provides an SQL-like interface to efficiently query and manipulate large data sets. You’ll then gain insights into Apache Spark, an open-source processing engine that provides users with new ways to store and use big data. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the components that make up Apache Spark. You’ll learn about DataFrames and perform basic DataFrame operations and work with SparkSQL. Explore how Spark processes and monitors the requests your application submits and how you can track work using the Spark Application UI. This course has several hands-on labs to help you apply and practice the concepts you learn. You will complete Hadoop and Spark labs using various tools and technologies, including Docker, Kubernetes, Python, and Jupyter Notebooks.
-
Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos. Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient in evaluating ML models using SparkML. In practical labs, you'll utilize SparkML for regression, classification, and clustering, enabling you to construct prediction and classification models. Connect to Spark clusters, analyze SparkSQL datasets, perform ETL activities, and create ML models using Spark ML and sci-kit learn. Finally, demonstrate your acquired skills through a final assignment. This intermediate course is suitable for aspiring and experienced data engineers, as well as working professionals in data analysis and machine learning. Prior knowledge in Big Data, Hadoop, Spark, Python, and ETL is highly recommended for this course.
Taught by
Aije Egwaikhide, Karthik Muthuraman, Ramesh Sannareddy, Rav Ahuja, Skills Network and Steve Ryan