This is the second of five courses in the Google Cloud Data Analytics Certificate. In this course, you’ll explore how data is structured and organized. You’ll gain hands-on experience with the data lakehouse architecture and cloud components like BigQuery, Google Cloud Storage, and DataProc to efficiently store, analyze, and process large datasets.
Overview
Syllabus
- Introduction to data management and storage in the cloud
- Introduction to Course 2
- Course 2 overview
- Eric: Data analytics skills translate across industries and roles
- Helpful resources and tips
- Lab technical tips
- Explore your Course 2 scenario: TheLook eCommerce
- Welcome to module 1
- Data storage and connections
- Gerrit: Experience with a variety of tools can help you as an analyst
- Common ways to store data
- [Supplemental] Common data storage systems
- AI-based predictive data management
- Structured, unstructured, and semi-structured data
- Overview of data lakehouse architecture
- Example of a data lakehouse
- Comparison of data warehouses and data lakehouses
- Test your knowledge: Data storage options
- Aspects of table schema
- Overview of BigQuery's schema editing abilities
- Components of BigQuery table schema
- Complex data types in BigQuery
- Introduction to nested data structure
- Guide to BigQuery
- Explore flat and nested data types in BigQuery
- Test your knowledge: Data types and organization in BigQuery
- Overview of data processing methods
- Batch versus streaming data processing
- Identify different batch and streaming data sources
- Test your knowledge: Batch and streaming data sources
- Wrap-up
- Glossary terms from module 1
- Module 1 challenge
- Key components of data organization
- Welcome to module 2
- Denormalized data
- Normalized and denormalized data
- Test your knowledge: Ways to organize data
- Data governance for effective data management
- MK: Risk management in a cloud-first world
- Components and objectives of data governance
- Introduction to master data management
- Test your knowledge: Data governance
- Introduction to data catalogs
- Data catalog components
- Technical and business metadata
- Test your knowledge: Foundations of accessible data
- Overview of data lakehouse architecture
- Components of data lakehouse architecture
- Data lakehouse implementation best practices
- Explore a lakehouse
- Test your knowledge: Data lakehouse architecture
- Wrap-up
- Glossary terms from module 2
- Module 2 challenge
- Steps to find data
- Welcome to module 3
- Ryan: Curiosity can help you understand and connect data
- How to find data using BigQuery
- Data lineage and traceability
- Dataplex's data lineage feature
- How to use the Dataplex data lineage feature
- Test your knowledge: Strategies for understanding data sources
- Introduction to Analytics Hub
- Analytics Hub enables data sharing
- How to use Analytics Hub
- Test your knowledge: Tools for sharing data
- Data discovery, curation, and unification
- Overview of Dataplex
- Benefits of using Dataplex
- How to search for data with BigQuery
- Navigate Dataplex
- Test your knowledge: Dataplex and BigQuery for accessing data
- Wrap-up
- Glossary terms from module 3
- Module 3 challenge
- Techniques to access data
- Welcome to module 4
- Methods for defining BigQuery table schemas
- Auto-detection of schemas in BigQuery
- Basic SQL commands for querying data
- [Supplemental] SQL query terms
- Compare data analytics with BigQuery and Dataproc
- Test your knowledge: Data schemas and queries in BigQuery
- Steps and models for accessing data with machine learning
- Cloud-based machine learning can train predictive models
- Introduction to machine learning with Vertex AI and BigQuery
- Overview of Google Colab
- Managed notebooks
- Test your knowledge: Integration of Google Cloud tools
- Essentials of database partitioning
- Benefits of data partitioning
- Methods for partitioning tables
- Data partitioning reduces cloud costs
- Create a partitioned table
- Test your knowledge: Overview of data partitioning
- Strategies for querying partitioned tables
- Tips for interacting with partitioned tables
- Manage a partitioned table in BigQuery
- Test your knowledge: Techniques for managing partitioned tables
- Key processes and benefits of Dataproc
- How to create a Dataproc cluster
- How to manage Dataproc clusters
- Test your knowledge: Dataproc for automation and improved data processing
- Wrap-up
- Vince and George: Interview role play
- Interview tip: Provide examples
- Glossary terms from module 4
- Module 4 challenge
- Course wrap-up
- Course 2 resources and citations
- Glossary terms from Course 2
- Your Next Steps
- Course Badge