In this lab you will learn how to implement logistic regression using a machine learning library for Apache Spark running on a Google Cloud Dataproc cluster to develop a model for data from a multivariable dataset
Overview
Syllabus
- GSP271
- Overview
- Setup and requirements
- Task 1. Create a Dataproc cluster
- Task 2. Set up bucket and start pyspark session
- Task 3. Read and clean up dataset
- Task 4. Develop a logistic regression model
- Task 5. Save and restore a logistic regression model
- Task 6. Predict with the logistic regression model
- Task 7. Examine model behavior
- Task 8. Evaluate the model
- Congratulations!