worker_machine_type tag not working in Google Cloud Dataflow with python
argparse behind the scenes to parse its argument. In the case of machine type, the name of the argument is
machine_type however the flag name is
worker_machine_type. This works fine in the following two cases, where argparse does its parsing and is aware of this aliasing:
- Passing arguments on the commandline. e.g.
my_pipeline.py --worker_machine_type custom-1-6656
- Passing arguments as a command line flags e.g.
flags['--worker_machine_type', 'worker_machine_type custom-1-6656', ...]
However it does not work well with
**kwargs. Any additional args passed in that way are used to substitute for known argument names (but not flag names).
In short, using
machine_type would work everywhere. I filed https://issues.apache.org/jira/browse/BEAM-4112 for this to be fixed in Beam in the future.
This can be solved by using the flag
machine_type instead of
worker_machine_type. The rest of the code works fine.
The documentation is thus mentioning the wrong field name.