Is there a performance penalty in accessing the Windows API through Delphi?

Read the source. A call to CreateWindowEx is defined in the Windows.pas unit as a direct call to the CreateWindowExW function in User32.DLL (from XE5's source - similar definitions are found in all versions of Delphi for the supported OS versions):

function CreateWindowEx(dwExStyle: DWORD; lpClassName: LPCWSTR;
  lpWindowName: LPCWSTR; dwStyle: DWORD; X, Y, nWidth, nHeight: Integer;
  hWndParent: HWND; hMenu: HMENU; hInstance: HINST; lpParam: Pointer): HWND;
  stdcall; external user32 name 'CreateWindowExW';

So the answer to your specific question is no. There is no performance penalty. A call to a WinAPI function in Delphi does not incur a performance hit.

Does a call to the Delphi Windows API function immediately call the Windows API function?

Not it does not.

For sake of argument let us consider a call to CloseHandle. This is declared in the Windows unit and implemented using external. When you call it, you do in fact call a function named CloseHandle in the Windows unit. So in pseudo assembler it looks like this:

.... prepare parameters
CALL     Windows.CloseHandle

Then, Windows.CloseHandle is implemented like this:

JMP      kernel32.CloseHandle

So, compared to a direct call, there is a call to a thunk function, and then a jump into the Win32 DLL. This is known as a trampoline.

It could be implemented differently. The compiler could emit code to call directly into the Win32 DLL. And some compilers will do this. For instance, the equivalent asm for this call as emitted by MSVC would be:

CALL     DWORD PTR [[email protected]]

Here, [email protected] is the address of a location in memory which contains the address of CloseHandle in the Windows DLL. The loader writes the actual address of CloseHandle into [email protected] at load time.

Which is more efficient? Impossible to say for sure without profiling. But I'm confident that any difference will be significant in a vanishingly small number of cases.

Of course it is possible also for code to be generated that would call directly with no indirection. That would involve the loader patch every call to the function. That is probably a bad idea however because it would lead to a large number of load time fixups which would be a performance issue at startup. That said, it would be pretty much the same as a DLL that needs to be relocated at load time. In any case, I know of no tool chain that adopts this policy.

Perhaps what you are concerned about is whether these functions are the real Win32 functions. Or whether there is a layer around them that changes the meaning. These are the real Win32 functions. There is no Delphi documentation because these are Win32 functions. The Win32 documentation on MSDN is the authoritative source of documentation.

As many people have said, the Win32 functions are called with no intervening layer. So they are directly called in these sense that your parameters are passed un-modified to the API functions. But the calling mechanism is indirect in the sense that it uses a trampoline. Semantically there is no difference.