Google DataFlow utility pipelines( File Conversion and Streaming data generation)
Dataflow Streaming Data Generator
This pipeline takes in a QPS parameter, a path to a schema file, and generates fake JSON messages (sample messages used for load testing and system integration testing) matching the schema to a Pub/Sub topic at the QPS rate.
JSON Data Generator library used by the pipeline allows various faker functions to be used for each schema field. See the docs for more information on the faker functions and schema format.”
Running the Streaming Data Generator template
- Go to the DataFlow page in the Cloud Console .
- Click Create job from template.
3. Select the Streaming Data Generator template from the Dataflow template drop-down menu. Enter a job name in the Job Name field.
4. Enter your parameter values in the provided parameter fields.
5. Click on RUN.
File Format Conversion
This template creates a batch pipeline that reads files from Google Cloud Storage (GCS), converts them to the desired format and stores them back in a GCS bucket. The supported file transformations are:
- Csv to Avro
- Csv to Parquet
- Avro to Parquet
- Parquet to Avro
Pipeline Requirements
- Input files in the GCS bucket are accessible to the Dataflow pipeline.
- Output GCS bucket exists and is accessible to the Dataflow pipeline.
Running File Format Conversion Pipelines
Follow the steps 1 and 2 from previous section.
3. Select the Streaming Data Generator template from the Convert file formats between Avro, Parquet & csv Template. Enter a job name in the Job Name field.
4. Enter your parameter values in the provided parameter fields.
5. Click on RUN.