Why does a processor get hot?

A transistor (FET, in modern ICs) never switches instantly from full OFF to full ON. There is a period while it's turning on or off where the FET acts like a resistor (even when fully ON it still has a resistance).

As you know, passing a current through a resistor generates heat (\$P=I^2R\$ or \$P=\frac{V^2}{R}\$).

The more the transistors switch the more time they spend in that resistive state, so the more heat they generate. So the amount of heat generated can be directly proportional to the number of transistors - but it is also dependent on which transistors are doing what and when, and that depends on what the chip is being instructed to do.

Yes, manufacturers may position specific blocks of their design (not individual transistors, but blocks that form a complete function) in certain areas depending on the heat that block could generate - either to place it in a location with better heat bonding, or to place it away from another block that may generate heat. They also have to take into account power distribution within the chip, so placing blocks arbitrarily may not always be possible, so they have to come to a compromise.

All current flow in anything that isn't a superconductor generates heat. In chips, it's mostly flowing in aluminium "metal" layers (why not copper? Nasty chemical interaction with other parts of the silicon, it turns out).

What causes current to flow? Every time a transistor changes state, this can be modeled as a capacitor (the FET gate of the driven logic gate plus parasitic wire capacitance) charging/discharging through the wire and output FET of the previous gate. This is "switching" or "dynamic" power. It's proportional to switching speed and the square of the voltage; hence the drive from 5V to 3.3V to 1.8V for better efficiency.

The insulators are not perfect, and in some places are very thin. Transistors may not be fully "off". If a FET has an off resistance of a megaohm, and you put a million of them in parallel, it looks like a 1 ohm resistor. This is "leakage" power. It's proportional to number of transistors.

I spent a decade working at a startup on power optimisation. :) There are a lot of techniques: speed/leakage tradeoffs ("high k metal gate"), turning off parts of the circuit entirely, clock gating, reduction of clock frequency, sizing and placement.