How to improve the performance iterating over 130 items uploading them to aws s3

The parallelism parameters decides how many threads will be used by ForkJoinPool. That's why by default parallelism value is the available CPU core count:

Math.min(MAX_CAP, Runtime.getRuntime().availableProcessors())

In your case the bottlneck should be checking that a file exists and uploading it to S3. The time here will depend on at least few factors: CPU, network card and driver, operating system, other. It seems that S3 network operation time is not CPU bound in your case as you are observing improvement by creating more simulations worker threads, perhaps the network request are enqueued by the operating system.

The right value for parallelism varies from one workload type to another. A CPU-bound workflow is better with the default parallelism equal to CPU cores due to the negative impact of context switching. A non CPU-bound workload like yours can be speed up with more worker threads assuming the workload won't block the CPU e.g. by busy waiting.

There is no one single ideal value for parallelism in ForkJoinPool.