Overview of Google Data Engineer
A data engineer enables data-driven decision making by gathering, transforming and publishing a meaningful set of data.
Please find below the objectives of this training:
- You will be able to design, build, operationalize, secure and monitor data processing systems with an emphasis on security and compliance; scalability and efficiency; reliability and fidelity.
- You will be able to leverage deploy and continuously train pre-existing ML models.
- You’ll be able to ensure solution quality
- You’ll be able to design data processing systems
Duration
4 Days
Prerequisite for Google Data Engineer
Good to have: Hands-on work experience on Google Cloud Technologies.
Must-Have: Understanding of how data works and what it can deliver for the organization.
Course Outline for Google Data Engineer
Introduction
- Theory, Practice, and Tests
- Lab: Setting Up A GCP Account
- Lab: Using The Cloud Shell
Compute
- About this section
- Compute Options
- Google Compute Engine (GCE)
- Lab: Creating a VM Instance
More GCE
- Lab: Editing a VM Instance
- Lab: Creating a VM Instance Using The Command Line
- Lab: Creating And Attaching A Persistent Disk
Google Container Engine – Kubernetes (GKE)
More GKE
- Lab: Creating A Kubernetes Cluster And Deploying A WordPress Container
App Engine
- Contrasting App Engine, Compute Engine, and Container Engine
- Lab: Deploy And Run An App Engine App
Compute
Storage
- Storage Options
- Quick Take
- Cloud Storage
- Lab: Working With Cloud Storage Buckets
- Lab: Bucket And Object Permissions
- Lab: Life cycle Management On Buckets
- Fix for AccessDeniedException: 403 Insufficient Permission
- Lab: Running A Program On a VM Instance And Storing Results on Cloud Storage
Virtual Machines and Images
- Live Migration
- Machine Types and Billing
- Sustained Use and Committed Use Discounts
- Rightsizing Recommendations
- RAM Disk
- Images
- Startup Scripts And Baked Images
VPCs and Interconnecting Networks
- VPCs And Subnets
- Global VPCs, Regional Subnets
- IP Addresses
- Lab: Working with Static IP Addresses
- Routes
- Firewall Rules
- Lab: Working with Firewalls
- Lab: Working with Auto Mode and Custom Mode Networks
- Lab: Bastion Host
Cloud VPN
- Lab: Working with Cloud VPN
- Cloud Router
- Lab: Using Cloud Routers for Dynamic Routing
- Dedicated Interconnect Direct & Carrier Peering
- Shared-VPCs
- Lab: Shared VPCs
- VPC: Network Peering
- Lab: VPC Peering
- Cloud-DNS & Legacy Networks
- Networking
Managed Instance Groups and Load Balancing
- Managed and Unmanaged Instance Groups
- Types of Load Balancing
- Overview of HTTP(S) Load Balancing
- Forwarding Rules Target Proxy and Url Maps
- Preview
- Backend Service & Backends
- Load Distribution & Firewall Rules
- Lab: HTTP(S)-Load Balancing
- Lab: Content-Based Load Balancing
- SSL Proxy and TCP Proxy Load Balancing
- Lab: SSL Proxy Load Balancing
- Network Load Balancing
- Internal Load Balancing
- Autoscalers
- Lab: Autoscaling with Managed Instance Groups
Ops & Security
- Stack Driver
- Stack Driver Logging
- Lab: Stack driver Resource Monitoring
- Lab: Stack driver Error Reporting & Debugging
Cloud-Deployment-Manager
- Lab: Using-Deployment-Manager
- Lab: Deployment Manager & Stackdriver
Cloud: Endpoints
- Cloud-IAM: User accounts, Service accounts, API Credentials
- Cloud-IAM: Roles, Identity Aware Proxy, Best Practices
- Lab: Cloud-IAM
Data Protection
- Operations and Security
Transfer Service
- Lab: Migrating Data Using The Transfer Service gcloud init
- Lab: Cloud Storage Versioning, Directory Sync
Cloud SQL, Cloud Spanner ~ OLTP ~ RDBMS
- Cloud SQL
- Lab: Creating A Cloud SQL Instance
- Lab: Running Commands On Cloud SQL Instance
- Lab: Bulk Loading Data Into Cloud SQL Tables
Cloud Spanner
- More Cloud Spanner
- Lab: Working With Cloud Spanner
BigTable ~ HBase = Columnar Store
- BigTable Intro
- Columnar Store
- Denormalised
- Column Families
- BigTable Performance
- Getting the HBase Prompt
- Lab: BigTable demo
Datastore ~ Document Database
- Datastore
- Lab: Datastore demo
BigQuery ~ Hive ~ OLAP
- BigQuery Intro
- BigQuery Advanced
- Lab: Loading CSV Data Into Big Query
- Lab: Running Queries On Big Query
- Lab: Loading JSON Data With Nested Tables
- Lab: Public Datasets In Big Query
- Lab: Using Big Query Via The Command Line
- Lab: Aggregations And Conditionals In Aggregations
- Lab: Subqueries And Joins
- Lab: Regular Expressions In Legacy SQL
- Lab: Using The With Statement For SubQueries
Dataflow: Apache Beam
- Data Flow Intro
- Apache-Beam
- Lab: Running A Python Data flow Program
- Lab: Running A Java Data flow Program
- Lab: Implementing Word Count In Dataflow Java
- Lab: Executing The Word Count Dataflow
- Lab: Executing MapReduce In Data-flow In Python
- Lab: Executing MapReduce In Data-flow In Java
Dataproc: Manage Hadoop
- Data Proc
- Lab: Creating & Managing A Dataproc Cluster
- Lab: Creating A Firewall Rule To Access Dataproc
- Lab: Running A PySpark Job On Dataproc
- Lab: Running The PySpark REPL Shell And Pig Scripts On Dataproc
- Lab: Submitting A Spark Jar To Dataproc
- Lab: Working With Dataproc Using The GCloud CLI
Pub/Sub for Streaming
- Pub Sub
- Lab: Working With Pubsub On The Command Line
- Lab: Working With PubSub Using The Web Console
- Lab: Setting Up A Pubsub Publisher Using The Python Library
- Lab: Setting Up A Pubsub Subscriber Using The Python Library
- Lab: Publishing Streaming Data Into Pubsub
- Lab: Reading Streaming Data From PubSub And Writing To BigQuery
- Lab: Executing A Pipeline To Read Streaming Data And Write To BigQuery
- Lab: Pubsub Source BigQuery Sink
Datalab ~ Jupyter
- Data Lab
- Lab: Creating And Working On A Datalab Instance
- Lab: Importing And Exporting Data Using Datalab
- Lab: Using The Charting API In Datalab
Vision, Translate, NLP and Speech: Trained ML APIs
- Lab: Taxicab Prediction – Setting up the dataset
- Lab: Taxicab Prediction – Training and Running the model
- Lab: The Vision, Translate, NLP and Speech API
- Lab: The Vision API for Label and Landmark Detection
Additional topics in brief which are prerequisite for this course
Appendix: Hadoop Ecosystem
- Introducing the Hadoop Ecosystem
- Hadoop
- HDFS
- MapReduce
- Yarn
- Hive
- Hive vs. RDBMS
- HQL vs. SQL
- OLAP in Hive
- Windowing Hive
- Pig
- Spark
- Streams Intro
- Microbatches
- Window Types
- Hadoop Ecosystem
- Introduction
- Theory, Practice, and Tests
- Lab: Setting Up A GCP Account
- Lab: Using The Cloud Shell