Google Data Engineer Training

A data engineer enables data-driven decision making by gathering, transforming and publishing a meaningful set of data. Please find below the objectives of this training: You will be able to…

Created by

Stalwart Learning

Date & Time

Price

Duration

4 Days

Location

https://stalwartlearning.com

ENQUIRE NOW

Course Description

Overview of Google Data Engineer

A data engineer enables data-driven decision making by gathering, transforming and publishing a meaningful set of data.

Please find below the objectives of this training:

You will be able to design, build, operationalize, secure and monitor data processing systems with an emphasis on security and compliance; scalability and efficiency; reliability and fidelity.
You will be able to leverage deploy and continuously train pre-existing ML models.
You’ll be able to ensure solution quality
You’ll be able to design data processing systems

Duration

4 Days

Prerequisite for Google Data Engineer

Good to have: Hands-on work experience on Google Cloud Technologies.

Must-Have: Understanding of how data works and what it can deliver for the organization.

Course Outline for Google Data Engineer

Introduction

Theory, Practice, and Tests
Lab: Setting Up A GCP Account
Lab: Using The Cloud Shell

Compute

About this section
Compute Options
Google Compute Engine (GCE)
Lab: Creating a VM Instance

More GCE

Lab: Editing a VM Instance
Lab: Creating a VM Instance Using The Command Line
Lab: Creating And Attaching A Persistent Disk

Google Container Engine – Kubernetes (GKE)

More GKE

Lab: Creating A Kubernetes Cluster And Deploying A WordPress Container

App Engine

Contrasting App Engine, Compute Engine, and Container Engine
Lab: Deploy And Run An App Engine App

Compute

Storage

Storage Options
Quick Take
Cloud Storage
Lab: Working With Cloud Storage Buckets
Lab: Bucket And Object Permissions
Lab: Life cycle Management On Buckets
Fix for AccessDeniedException: 403 Insufficient Permission
Lab: Running A Program On a VM Instance And Storing Results on Cloud Storage

Virtual Machines and Images

Live Migration
Machine Types and Billing
Sustained Use and Committed Use Discounts
Rightsizing Recommendations
RAM Disk
Images
Startup Scripts And Baked Images

VPCs and Interconnecting Networks

VPCs And Subnets
Global VPCs, Regional Subnets
IP Addresses
Lab: Working with Static IP Addresses
Routes
Firewall Rules
Lab: Working with Firewalls
Lab: Working with Auto Mode and Custom Mode Networks
Lab: Bastion Host

Cloud VPN

Lab: Working with Cloud VPN
Cloud Router
Lab: Using Cloud Routers for Dynamic Routing
Dedicated Interconnect Direct & Carrier Peering
Shared-VPCs
Lab: Shared VPCs
VPC: Network Peering
Lab: VPC Peering
Cloud-DNS & Legacy Networks
Networking

Managed Instance Groups and Load Balancing

Managed and Unmanaged Instance Groups
Types of Load Balancing
Overview of HTTP(S) Load Balancing
Forwarding Rules Target Proxy and Url Maps
Preview
Backend Service & Backends
Load Distribution & Firewall Rules
Lab: HTTP(S)-Load Balancing
Lab: Content-Based Load Balancing
SSL Proxy and TCP Proxy Load Balancing
Lab: SSL Proxy Load Balancing
Network Load Balancing
Internal Load Balancing
Autoscalers
Lab: Autoscaling with Managed Instance Groups

Ops & Security

Stack Driver
Stack Driver Logging
Lab: Stack driver Resource Monitoring
Lab: Stack driver Error Reporting & Debugging

Cloud-Deployment-Manager

Lab: Using-Deployment-Manager
Lab: Deployment Manager & Stackdriver

Cloud: Endpoints

Cloud-IAM: User accounts, Service accounts, API Credentials
Cloud-IAM: Roles, Identity Aware Proxy, Best Practices
Lab: Cloud-IAM

Data Protection

Operations and Security

Transfer Service

Lab: Migrating Data Using The Transfer Service gcloud init

Lab: Cloud Storage Versioning, Directory Sync

Cloud SQL, Cloud Spanner ~ OLTP ~ RDBMS

Cloud SQL
Lab: Creating A Cloud SQL Instance
Lab: Running Commands On Cloud SQL Instance
Lab: Bulk Loading Data Into Cloud SQL Tables

Cloud Spanner

More Cloud Spanner
Lab: Working With Cloud Spanner

BigTable ~ HBase = Columnar Store

BigTable Intro
Columnar Store
Denormalised
Column Families
BigTable Performance
Getting the HBase Prompt
Lab: BigTable demo

Datastore ~ Document Database

Datastore
Lab: Datastore demo

BigQuery ~ Hive ~ OLAP

BigQuery Intro
BigQuery Advanced
Lab: Loading CSV Data Into Big Query
Lab: Running Queries On Big Query
Lab: Loading JSON Data With Nested Tables
Lab: Public Datasets In Big Query
Lab: Using Big Query Via The Command Line
Lab: Aggregations And Conditionals In Aggregations
Lab: Subqueries And Joins
Lab: Regular Expressions In Legacy SQL
Lab: Using The With Statement For SubQueries

Dataflow: Apache Beam

Data Flow Intro
Apache-Beam
Lab: Running A Python Data flow Program
Lab: Running A Java Data flow Program
Lab: Implementing Word Count In Dataflow Java
Lab: Executing The Word Count Dataflow
Lab: Executing MapReduce In Data-flow In Python
Lab: Executing MapReduce In Data-flow In Java

Dataproc: Manage Hadoop

Data Proc
Lab: Creating & Managing A Dataproc Cluster
Lab: Creating A Firewall Rule To Access Dataproc
Lab: Running A PySpark Job On Dataproc
Lab: Running The PySpark REPL Shell And Pig Scripts On Dataproc
Lab: Submitting A Spark Jar To Dataproc
Lab: Working With Dataproc Using The GCloud CLI

Pub/Sub for Streaming

Pub Sub
Lab: Working With Pubsub On The Command Line
Lab: Working With PubSub Using The Web Console
Lab: Setting Up A Pubsub Publisher Using The Python Library
Lab: Setting Up A Pubsub Subscriber Using The Python Library
Lab: Publishing Streaming Data Into Pubsub
Lab: Reading Streaming Data From PubSub And Writing To BigQuery
Lab: Executing A Pipeline To Read Streaming Data And Write To BigQuery
Lab: Pubsub Source BigQuery Sink

Datalab ~ Jupyter

Data Lab
Lab: Creating And Working On A Datalab Instance
Lab: Importing And Exporting Data Using Datalab
Lab: Using The Charting API In Datalab

Vision, Translate, NLP and Speech: Trained ML APIs

Lab: Taxicab Prediction – Setting up the dataset
Lab: Taxicab Prediction – Training and Running the model
Lab: The Vision, Translate, NLP and Speech API
Lab: The Vision API for Label and Landmark Detection

Additional topics in brief which are prerequisite for this course

Appendix: Hadoop Ecosystem

Introducing the Hadoop Ecosystem
Hadoop
HDFS
MapReduce
Yarn
Hive
Hive vs. RDBMS
HQL vs. SQL
OLAP in Hive
Windowing Hive
Pig
Spark
Streams Intro
Microbatches
Window Types
Hadoop Ecosystem
Introduction
Theory, Practice, and Tests
Lab: Setting Up A GCP Account
Lab: Using The Cloud Shell