a

Google Data Engineer

A data engineer enables data-driven decision making by gathering, transforming and publishing a meaningful set of data.

Please find below the objectives of this training:

  • You will be able to design, build, operationalize, secure and monitor data processing systems with an emphasis on security and compliance; scalability and efficiency; reliability and fidelity.
  • You will be able to leverage deploy and continuously train pre-existing ML models.
  • You’ll be able to ensure solution quality
  • You’ll be able to design data processing systems

4 Days

Good to have: Hands-on work experience on Google Cloud Technologies.

Must-Have:  Understanding of how data works and what it can deliver for the organization

  • Theory, Practice, and Tests
  • Lab: Setting Up A GCP Account
  • Lab: Using The Cloud Shell
  • About this section
  • Compute Options
  • Google Compute Engine (GCE)
  • Lab: Creating a VM Instance

More GCE

  • Lab: Editing a VM Instance
  • Lab: Creating a VM Instance Using The Command Line
  • Lab: Creating And Attaching A Persistent Disk

More GKE

  • Lab: Creating A Kubernetes Cluster And Deploying A WordPress Container

App Engine

  • Contrasting App Engine, Compute Engine, and Container Engine
  • Lab: Deploy And Run An App Engine App

Compute

  • Storage Options
  • Quick Take
  • Cloud Storage
  • Lab: Working With Cloud Storage Buckets
  • Lab: Bucket And Object Permissions
  • Lab: Life cycle Management On Buckets
  • Fix for AccessDeniedException: 403 Insufficient Permission
  • Lab: Running A Program On a VM Instance And Storing Results on Cloud Storage
  • Live Migration
  • Machine Types and Billing
  • Sustained Use and Committed Use Discounts
  • Rightsizing Recommendations
  • RAM Disk
  • Images
  • Startup Scripts And Baked Images
  • VPCs And Subnets
  • Global VPCs, Regional Subnets
  • IP Addresses
  • Lab: Working with Static IP Addresses
  • Routes
  • Firewall Rules
  • Lab: Working with Firewalls
  • Lab: Working with Auto Mode and Custom Mode Networks
  • Lab: Bastion Host
  • Lab: Working with Cloud VPN
  • Cloud Router
  • Lab: Using Cloud Routers for Dynamic Routing
  • Dedicated Interconnect Direct & Carrier Peering
  • Shared-VPCs
  • Lab: Shared VPCs
  • VPC: Network Peering
  • Lab: VPC Peering
  • Cloud-DNS & Legacy Networks
  • Networking
  • Managed and Unmanaged Instance Groups
  • Types of Load Balancing
  • Overview of HTTP(S) Load Balancing
  • Forwarding Rules Target Proxy and Url Maps
  • Preview
  • Backend Service & Backends
  • Load Distribution & Firewall Rules
  • Lab: HTTP(S)-Load Balancing
  • Lab: Content-Based Load Balancing
  • SSL Proxy and TCP Proxy Load Balancing
  • Lab: SSL Proxy Load Balancing
  • Network Load Balancing
  • Internal Load Balancing
  • Autoscalers
  • Lab: Autoscaling with Managed Instance Groups
  • Stack Driver
  • Stack Driver Logging
  • Lab: Stack driver Resource Monitoring
  • Lab: Stack driver Error Reporting & Debugging
  • Lab: Using-Deployment-Manager
  • Lab: Deployment Manager & Stackdriver
  • Cloud-IAM: User accounts, Service accounts, API Credentials
  • Cloud-IAM: Roles, Identity Aware Proxy, Best Practices
  • Lab: Cloud-IAM

Operations and Security

  • Lab: Migrating Data Using The Transfer Service gcloud init

Lab: Cloud Storage Versioning, Directory Sync

  • Cloud SQL
  • Lab: Creating A Cloud SQL Instance
  • Lab: Running Commands On Cloud SQL Instance
  • Lab: Bulk Loading Data Into Cloud SQL Tables
  • More Cloud Spanner
  • Lab: Working With Cloud Spanner
  • BigTable Intro
  • Columnar Store
  • Denormalised
  • Column Families
  • BigTable Performance
  • Getting the HBase Prompt
  • Lab: BigTable demo
  • Datastore
  • Lab: Datastore demo
  • BigQuery Intro
  • BigQuery Advanced
  • Lab: Loading CSV Data Into Big Query
  • Lab: Running Queries On Big Query
  • Lab: Loading JSON Data With Nested Tables
  • Lab: Public Datasets In Big Query
  • Lab: Using Big Query Via The Command Line
  • Lab: Aggregations And Conditionals In Aggregations
  • Lab: Subqueries And Joins
  • Lab: Regular Expressions In Legacy SQL
  • Lab: Using The With Statement For SubQueries
  • Data Flow Intro
  • Apache-Beam
  • Lab: Running A Python Data flow Program
  • Lab: Running A Java Data flow Program
  • Lab: Implementing Word Count In Dataflow Java
  • Lab: Executing The Word Count Dataflow
  • Lab: Executing MapReduce In Data-flow In Python
  • Lab: Executing MapReduce In Data-flow In Java
  • Data Proc
  • Lab: Creating & Managing A Dataproc Cluster
  • Lab: Creating A Firewall Rule To Access Dataproc
  • Lab: Running A PySpark Job On Dataproc
  • Lab: Running The PySpark REPL Shell And Pig Scripts On Dataproc
  • Lab: Submitting A Spark Jar To Dataproc
  • Lab: Working With Dataproc Using The GCloud CLI
  • Pub Sub
  • Lab: Working With Pubsub On The Command Line
  • Lab: Working With PubSub Using The Web Console
  • Lab: Setting Up A Pubsub Publisher Using The Python Library
  • Lab: Setting Up A Pubsub Subscriber Using The Python Library
  • Lab: Publishing Streaming Data Into Pubsub
  • Lab: Reading Streaming Data From PubSub And Writing To BigQuery
  • Lab: Executing A Pipeline To Read Streaming Data And Write To BigQuery
  • Lab: Pubsub Source BigQuery Sink
  • Data Lab
  • Lab: Creating And Working On A Datalab Instance
  • Lab: Importing And Exporting Data Using Datalab
  • Lab: Using The Charting API In Datalab
  • Lab: Taxicab Prediction – Setting up the dataset
  • Lab: Taxicab Prediction – Training and Running the model
  • Lab: The Vision, Translate, NLP and Speech API
  • Lab: The Vision API for Label and Landmark Detection

Appendix: Hadoop Ecosystem

  • Introducing the Hadoop Ecosystem
  • Hadoop
  • HDFS
  • MapReduce
  • Yarn
  • Hive
  • Hive vs. RDBMS
  • HQL vs. SQL
  • OLAP in Hive
  • Windowing Hive
  • Pig
  • Spark
  • Streams Intro
  • Microbatches
  • Window Types
  • Hadoop Ecosystem
  • Introduction
  • Theory, Practice, and Tests
  • Lab: Setting Up A GCP Account
  • Lab: Using The Cloud Shell