If you’re new to Cloud Dataflow, I suggest starting here and reading the official docs first.

  1. Develop locally using and not on Google Cloud using the . The allows you to run your pipeline locally, without the need to pay for worker pools on GCP.
  2. When you want to shake-out a pipeline on a Google Cloud using the , use a subset of data and just one small instance to begin with. There's no need to spin up massive worker pools. That's just a waste of money silly.