This post explains step by step guide for preparing the Google Professional Data Engineer Certification.

A Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A Data Engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on…


https://cloud.google.com/blog/products/application-development/get-to-know-google-cloud-workflows

Google Cloud Workflows

Orchestrate and automate Google Cloud and HTTP-based API services with server-less workflows.

You can use Workflows to create serverless workflows that link series of serverless tasks together in an order you define. Combine the power of Google Cloud’s APIs, serverless products like Cloud Functions and Cloud Run, and calls to…


Dataflow Streaming Data Generator

This pipeline takes in a QPS parameter, a path to a schema file, and generates fake JSON messages (sample messages used for load testing and system integration testing) matching the schema to a Pub/Sub topic at the QPS rate.

JSON Data Generator library used by the pipeline allows various faker…


  1. Multi-tab query editing
  2. 99.99% SLA
  3. Pricing recommendation to select models (on-demand, Flat-rate, flex)
  4. Native query Admin Console UI
  5. Real-time Resource information with INFORMATION_SCHEMA
  6. Automated slot management with Bigquery Slots Autoscaling.

Fully managed relational database service for MySQL, PostgreSQL, and SQL Server.

Features:

  • Ensure business continuity with reliable and secure services backed by 24/7 SRE team
  • Reduce maintenance cost with fully managed relational databases in the cloud
  • Automates database provisioning, storage capacity management, and other time-consuming tasks
  • Easy integration with existing…

If you’re new to Cloud Dataflow, I suggest starting here and reading the official docs first.

  1. Develop locally usingDirectRunner and not on Google Cloud using the DataflowRunner. The Direct Runner allows you to run your pipeline locally, without the need to pay for worker pools on GCP.
  2. When you want to shake-out a pipeline on a Google Cloud using the DataflowRunner, use a subset of data…

If you’re new to BigQuery, I suggest starting here and reading the official docs first.

  1. Export all your audit and billing logs back to BigQuery for analysis. I don’t know how many times this pattern has saved my butt.
  2. Don’t be lazy with your SQL. Avoid SELECT * on big…

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python

Installation or Setup Installing pandas with Anaconda

Installing…


Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. With a graphical interface and a broad open-source library of preconfigured connectors and transformations, Cloud Data Fusion shifts an organization’s focus away from code and integration to insights and…

Harshad Patel

7x GCP | 2X Oracle Cloud| 1X Azure Certified | Cloud Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store