Some puzzles for the operator Parallelism in Flink

Execution Environment Level As mentioned here Flink programs are executed in the context of an execution environment. An execution environment defines a default parallelism for all operators, data sources, and data sinks it executes. Execution environment parallelism can be overwritten by explicitly configuring the parallelism of an operator.

The default parallelism of an execution environment can be specified by calling the setParallelism() method. To execute all operators, data sources, and data sinks with a parallelism of 3, set the default parallelism of the execution environment as follows:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(3);

DataStream<String> text = [...]
DataStream<Tuple2<String, Integer>> wordCounts = [...]
wordCounts.print();

env.execute("Word Count Example");

When calling setParallelism on an operator, then it changes the parallelism of this specific operator. Consequently, in your example, only the window operator will be executed with a parallelism of 5 and the preceding flatMap operator with the default parallelism.

Consequently, you can set for each operator a different parallelism. However, be aware that operators with different parallelism cannot be chained and entail a rebalance (similar to a shuffle) operation.

If you want to set the parallelism for all operators, then you have to do it via the ExecutionEnvironment#setParallelism API call.

The keyBy operation partitions in the input stream into as many partitions as you have parallel operator instances. This makes sure that all elements with the same key end up in the same partition. So in your example where you set the parallelism to 5, you would end up with 5 partitions. Each partition can harbour elements with different keys.