Cloud Dataflow

If you’re new to Cloud Dataflow, I suggest starting here and reading the official docs first.

  1. Develop locally usingDirectRunner and not on Google Cloud using the DataflowRunner. The Direct Runner allows you to run your pipeline locally, without the need to pay for worker pools on GCP.
  2. When you want to shake-out a pipeline on a Google Cloud using the DataflowRunner, use a subset of data and just one small instance to begin with. There's no need to spin up massive worker pools. That's just a waste of money silly.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store