If you’re new to Cloud Dataflow, I suggest starting here and reading the official docs first.
- Develop locally using
DirectRunner and not on Google Cloud using the
Direct Runner allows you to run your pipeline locally, without the need to pay for worker pools on GCP.
- When you want to shake-out a pipeline on a Google Cloud using the
DataflowRunner, use a subset of data and just one small instance to begin with. There's no need to spin up massive worker pools. That's just a waste of money silly.