How to optimize spark sql to run it in parallel

You're not reading the DAG graph correctly - the fact that each step is visualized using a single box does not mean that it isn't using multiple tasks (and therefore cores) to calculate that step.

You can see how many tasks are used for each step by drilling-down into the stage view, that displays all tasks for this stage.

For example, here's a sample DAG visualization similar to yours:

enter image description here

You can see each stage is depicted by a "single" column of steps.

But if we look at the table below, we can see the number of tasks per stage:

enter image description here

One of them is using only 2 tasks, but the other uses 220, which means data is split into 220 partitions and partitions are processed in parallel, given enough available resources.

If you drill-down into that stage, you can see again that it used 220 tasks and details for all the tasks.

enter image description here

Only tasks reading data from disk are shown in graph as having these "multiple dots" to help you understand how many files were read.

SO - as Rashid's answer suggestes, check the number of tasks for each stage.