Google DataFlow utility pipelines( File Conversion and Streaming data generation)

Dataflow Streaming Data Generator

This pipeline takes in a QPS parameter, a path to a schema file, and generates fake JSON messages (sample messages used for load testing and system integration testing) matching the schema to a Pub/Sub topic at the QPS rate.

JSON Data Generator library used by the pipeline allows various faker functions to be used for each schema field. See the docs for more information on the faker functions and schema format.”

Source: google official documentation

Running the Streaming Data Generator template

3. Select the Streaming Data Generator template from the Dataflow template drop-down menu. Enter a job name in the Job Name field.

4. Enter your parameter values in the provided parameter fields.

5. Click on RUN.

File Format Conversion

This template creates a batch pipeline that reads files from Google Cloud Storage (GCS), converts them to the desired format and stores them back in a GCS bucket. The supported file transformations are:

Pipeline Requirements

Running File Format Conversion Pipelines

Follow the steps 1 and 2 from previous section.

3. Select the Streaming Data Generator template from the Convert file formats between Avro, Parquet & csv Template. Enter a job name in the Job Name field.

4. Enter your parameter values in the provided parameter fields.

5. Click on RUN.