Advantages and disadvantages to loops in VHDL unrolling

From the abstract of your second link:

Loop unrolling is the main compiler technique that allows reconfigurable architectures [to] achieve large degrees of parallelism. However, loop unrolling increases the area and can potentially have a negative impact on clock cycle time. In most embedded applications, the critical parameter is the throughput. Loop unrolling can therefore have contradictory effects on the throughput. As a consequence there exists, in general, a degree of unrolling that maximizes the throughput per unit area.

To me, this directly answers your question. The idea is basically this: if you have a process which must iterate over many cycles, you can employ loop unrolling to parallelize the algorithm to reduce the number of required clock cycles.

But this comes at a cost of a larger fabric footprint (more FPGA area), which in turn complicates clock routing leading to more difficult timing closure.

If timing closure can’t be obtained, the system is forced to use a slower clock frequency, which conflicts with the goal of unrolling.

Therefore, you have to find a balance between the two to maximize the gain in performance.

It's the only way a synthesis tool can implement an entire loop either in a single clock cycle (in a clocked process) or combinationally.

So advantages and disadvantages over other (non-existent) methods of translating a loop are a moot point.

It can, as already stated, generate rather large hardware. If that's a problem you need to find another coding approach, for example replcing the loop with a state machine to execute one iteration per clock cycle, for smaller (but slower) hardware.

'Loop Unrolling' is a systematic method of achieving parallelism that can be automated.

In the bad old days, we wrote machine code, then assembler, then simple compiled languages, then rich languages with useful libraries. This allowed us to write at a progressivley higher level, and let an automatic process take care of the translation to low level.

In the bad old days, we'd write VHDL, and then manually put several blocks in parallel to get the throughput, and manually schedule their operation, pipeline data, to get them to work. Expressing our intention as a high level loop and then letting an automatic process generate the low level timing and dependency ordering is simply applying the same automation principle to hardware design.

Advantages - speed, accuracy, humans like to think at high level.

Disadvantages - humans have this nagging feeling that because 'this bit here' looks inefficient, they could do it better. That's often correct, with early generation tools. It takes time for the interface to the tools to become easy to use, for the tools to become trusted, and for their performance to improve to the point that no corners remain where human tweaking might still be warranted.