Google DataFlow utility pipelines( File Conversion and Streaming data generation)

Dataflow Streaming Data Generator

This pipeline takes in a QPS parameter, a path to a schema file, and generates fake JSON messages (sample messages used for load testing and system integration testing) matching the schema to a Pub/Sub topic at the QPS rate.

Source: google official documentation

Running the Streaming Data Generator template

  1. Go to the DataFlow page in the Cloud Console .
  2. Click Create job from template.

File Format Conversion

This template creates a batch pipeline that reads files from Google Cloud Storage (GCS), converts them to the desired format and stores them back in a GCS bucket. The supported file transformations are:

  • Csv to Parquet
  • Avro to Parquet
  • Parquet to Avro

Pipeline Requirements

  • Input files in the GCS bucket are accessible to the Dataflow pipeline.
  • Output GCS bucket exists and is accessible to the Dataflow pipeline.

Running File Format Conversion Pipelines

Follow the steps 1 and 2 from previous section.

7x GCP | 2X Oracle Cloud| 1X Azure Certified | Cloud Data Engineer