Google DataFlow utility pipelines( File Conversion and Streaming data generation)

Dataflow Streaming Data Generator

This pipeline takes in a QPS parameter, a path to a schema file, and generates fake JSON messages (sample messages used for load testing and system integration testing) matching the schema to a Pub/Sub topic at the QPS rate.

JSON Data Generator library used by the pipeline allows various faker functions to be used for each schema field. See the docs for more information on the faker functions and schema format.”

Source: google official documentation

Running the Streaming Data Generator template

  1. Go to the DataFlow page in the Cloud Console .
  2. Click Create job from template.

3. Select the Streaming Data Generator template from the Dataflow template drop-down menu. Enter a job name in the Job Name field.

4. Enter your parameter values in the provided parameter fields.

5. Click on RUN.

File Format Conversion

This template creates a batch pipeline that reads files from Google Cloud Storage (GCS), converts them to the desired format and stores them back in a GCS bucket. The supported file transformations are:

  • Csv to Avro
  • Csv to Parquet
  • Avro to Parquet
  • Parquet to Avro

Pipeline Requirements

  • Input files in the GCS bucket are accessible to the Dataflow pipeline.
  • Output GCS bucket exists and is accessible to the Dataflow pipeline.

Running File Format Conversion Pipelines

Follow the steps 1 and 2 from previous section.

3. Select the Streaming Data Generator template from the Convert file formats between Avro, Parquet & csv Template. Enter a job name in the Job Name field.

4. Enter your parameter values in the provided parameter fields.

5. Click on RUN.

--

--

--

7x GCP | 2X Oracle Cloud| 1X Azure Certified | Cloud Data Engineer

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Uploading Images to S3 via API Gateway (PUT Request)

Understanding Structs in solidity

10 Terms To Help You Get More International Customers

How to configure a pre-commit for a flutter application.

Dependency injection in ASP.NET Core

Print(“Hello World”), How To Be On Your Way To Becoming A Python Programmer.

DLP for Developers Overview— Amazon Macie, Google DLP API and VoiceBase

how to add media/static files in Django from the eyes of a beginner

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshad Patel

Harshad Patel

7x GCP | 2X Oracle Cloud| 1X Azure Certified | Cloud Data Engineer

More from Medium

Google Cloud Alerting to Webex Teams using GCP Pub/Sub and Cloud Functions

GCP pipeline: pub/sub-lookup-storage (part 2/2)

Apache Airflow on GKE

Dataflow CI/CD with Github actions