worker_machine_type tag not working in Google Cloud Dataflow with python

PipelineOptions uses argparse behind the scenes to parse its argument. In the case of machine type, the name of the argument is machine_type however the flag name is worker_machine_type. This works fine in the following two cases, where argparse does its parsing and is aware of this aliasing:

  1. Passing arguments on the commandline. e.g. my_pipeline.py --worker_machine_type custom-1-6656
  2. Passing arguments as a command line flags e.g. flags['--worker_machine_type', 'worker_machine_type custom-1-6656', ...]

However it does not work well with **kwargs. Any additional args passed in that way are used to substitute for known argument names (but not flag names).

In short, using machine_type would work everywhere. I filed https://issues.apache.org/jira/browse/BEAM-4112 for this to be fixed in Beam in the future.


This can be solved by using the flag machine_type instead of worker_machine_type. The rest of the code works fine.

The documentation is thus mentioning the wrong field name.