What you'll learn:
- Students will be able to build End to End Big data project using Spark, Kafka, Cassandra, Scala and Java
Real-time Credit card Fraud Detection is implemented using Spark Kafka and Cassandra.
Spark ML Pipeline Stages like String Indexer, One Hot Encoder and Vector Assembler is used for Pre-processing
Machine Learning model is created using the Random Forest Algorithm
Data balancing is done using K-means Algorithm
Integration of Spark Streaming Job with Kafka and Cassandra
Exactly-once semantics is achieved using Spark Streaming custom offset management
Airflow Automation framework is used to automate Spark Jobs on Spark Standalone Cluster.