When to use cudaHostRegister() and cudaHostAlloc()? What is the meaning of "Pinned or page-locked" memory? Which are the equivalent in OpenCL?

What is the meaning of "pin the memory"?

It means make the memory page locked. That is telling the operating system virtual memory manager that the memory pages must stay in physical ram so that they can be directly accessed by the GPU across the PCI-express bus.

Why is it so fast? 

In one word, DMA. When the memory is page locked, the GPU DMA engine can directly run the transfer without requiring the host CPU, which reduces overall latency and decreases net transfer times.

Are the PCIe transaction managed entirely from the CPU?

No. See above.

Is there a manager of a bus that takes care of this?

No. The GPU manages the transfers. In this context there is no such thing as a bus master


EDIT: Seems like CUDA treats pinned and page-locked as the same as per the Pinned Host Memory section in this blog written by Mark Harris. This means by answer is moot and the best answer should be taken as is.

I bumped into this question while looking for something else. For all future users, I think @talonmies answers the question perfectly, but I'd like to bring to notice a slight difference between locking and pinning pages - the former ensures that the memory is not pageable but the kernel is free to move it around and the latter ensures that it stays in memory (i.e. non-pageable) but also is mapped to the same address. Here's a reference to the same.