Cloud Data Fusion Private Instance Guide

Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. With a graphical interface and a broad open-source library of preconfigured connectors and transformations, Cloud Data Fusion shifts an organization’s focus away from code and integration to insights and action.
Cloud Data Fusion is based on CDAP is a 100% open-source framework for build data pipelines.

Pricing

Pricing for the service is broken down into:

  • Cloud Data Fusion instance hours to operate the data integration interface

How to Create a Private Instance

Before creating a Data Fusion private instance, we need to create a VPC network and a private sub-network. Private Google Access is required by Cloud Data Fusion to establish a private connection with Dataproc cluster. To do so we need to allocate the IP range, to do so follow the steps mentioned below:

  1. Go to the VPC Network page of your network in which you want to create private Cloud Data Fusion Instance.

Command to create an instance

Export the following variable for ease of use. Refer these variable in actual commands:
export PROJECT = <project-id>
export LOCATION = <region> Example: us-east1, asia-east1
export DATA_FUSION_API_NAME = datafusion.googleapis.com

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1beta1/projects/$PROJECT/locations/ $LOCATION/instances?instanceId=<INSTANCE_NAME> -X POST -d '{"description": "Private CDF instance created through REST.", "type": "BASIC", "privateInstance": true, "networkConfig": {"network": "VPC_NETWORK", "ipAllocation": "IP_RANGE"}}'

The ipAllocation field value provided to the call is the one allocated in step 4 above.

Once a private instance is created it will be listed in the Data Fusion UI. You can perform any other operations which you perform on public instances from the Data Fusion UI for example you can delete the private instance from the UI.

Peering With Cloud Data Fusion Network

Cloud Data Fusion uses VPC Peering to provide private instances. A VPC Peering requires peering to be set up on both ends (networks) independently. A peering is automatically set up from the Cloud Data Fusion tenant project network to your network. You must set up the peering to Cloud Data Fusion network from your network to be able to connect to the private instance.

Finding Tenant Project Id You can retrieve the tenant project id from the instance details.
It is a part of the service account. For example
Service Account: cloud-datafusion-management-sa@<project-id>-tp.iam.gserviceaccount.com
Tenant Project Id: <project-id>

Creating VPC Peering

Steps to create a VPC Peering with the tenant project are as follows:

  1. Go to your VPC Network

Originally published at https://www.techojournal.com on May 30, 2020.

7x GCP | 2X Oracle Cloud| 1X Azure Certified | Cloud Data Engineer