Cloud Dataflow
If you’re new to Cloud Dataflow, I suggest starting here and reading the official docs first.
- Develop locally using
DirectRunner
and not on Google Cloud using theDataflowRunner
. TheDirect Runner
allows you to run your pipeline locally, without the need to pay for worker pools on GCP. - When you want to shake-out a pipeline on a Google Cloud using the
DataflowRunner
, use a subset of data and just one small instance to begin with. There's no need to spin up massive worker pools. That's just a waste of money silly.