Jan 26, 2021Member-onlyGCP Data Engineer Certification Prep Part-1This post explains step by step guide for preparing the Google Professional Data Engineer Certification. A Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A Data Engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on…Data Engineering4 min readData Engineering4 min read
Jan 20, 2021Member-onlyData Pipeline OrchestrationGoogle Cloud Workflows Orchestrate and automate Google Cloud and HTTP-based API services with server-less workflows. You can use Workflows to create serverless workflows that link series of serverless tasks together in an order you define. Combine the power of Google Cloud’s APIs, serverless products like Cloud Functions and Cloud Run, and calls to…Orchestration2 min readOrchestration2 min read
Aug 16, 2020Google DataFlow utility pipelines( File Conversion and Streaming data generation)Dataflow Streaming Data Generator This pipeline takes in a QPS parameter, a path to a schema file, and generates fake JSON messages (sample messages used for load testing and system integration testing) matching the schema to a Pub/Sub topic at the QPS rate. JSON Data Generator library used by the pipeline allows various faker…Dataflow2 min readDataflow2 min read
Aug 14, 2020Member-onlyBigQuery New FeaturesMulti-tab query editing 99.99% SLA Pricing recommendation to select models (on-demand, Flat-rate, flex) Native query Admin Console UI Real-time Resource information with INFORMATION_SCHEMA Automated slot management with Bigquery Slots Autoscaling. …Bigquery1 min readBigquery1 min readBigQuery New FeaturesMulti-tab query editing99.99% SLAPricing recommendation to select models (on-demand, Flat-rate, flex)Native query Admin Console UIReal-time Resource information with INFORMATION_SCHEMAAutomated slot management with Bigquery Slots Autoscaling.…----
Jun 26, 2020Member-onlyCommitted Use Discount on Google Cloud SQLFully managed relational database service for MySQL, PostgreSQL, and SQL Server. Features: Ensure business continuity with reliable and secure services backed by 24/7 SRE team Reduce maintenance cost with fully managed relational databases in the cloud Automates database provisioning, storage capacity management, and other time-consuming tasks Easy integration with existing…Google Cloud Platform2 min readGoogle Cloud Platform2 min read
Jun 19, 2020Member-onlyCloud DataflowIf you’re new to Cloud Dataflow, I suggest starting here and reading the official docs first. Develop locally usingDirectRunner and not on Google Cloud using the DataflowRunner. The Direct Runner allows you to run your pipeline locally, without the need to pay for worker pools on GCP. When you want to shake-out a pipeline on a Google Cloud using the DataflowRunner, use a subset of data…Google Cloud Platform4 min readGoogle Cloud Platform4 min read
Jun 12, 2020BigQuery Fun Facts!If you’re new to BigQuery, I suggest starting here and reading the official docs first. Export all your audit and billing logs back to BigQuery for analysis. I don’t know how many times this pattern has saved my butt. Don’t be lazy with your SQL. Avoid SELECT * on big…Bigquery5 min readBigquery5 min read
Jun 12, 2020Python: Getting started with pandasPandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python Installation or Setup Installing pandas with Anaconda Installing…Python3 min readPython3 min read
May 30, 2020Cloud Data Fusion Private Instance GuideCloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. With a graphical interface and a broad open-source library of preconfigured connectors and transformations, Cloud Data Fusion shifts an organization’s focus away from code and integration to insights and…Google Cloud Platform3 min readGoogle Cloud Platform3 min read