Set STM32 GPIO clock and data pins as fast as possible

Setting or clearing any combination of bits in an I/O port (even setting some and clearing others) should require at most three instructions; when repeated such instructions should take one instruction each. Unfortunately, it seems many vendors (ST among them) tend to define I/O libraries which generate subroutine calls even for common operations such as that.

I would suggest defining your own methods to set and clear I/O bits, and, include the code for those methods, marked with an __inline qualifier, in your .h file [as opposed to merely including a prototype in the .h and the method code in a separate .c file].

If within a .h file, you write:

__inline void SetPorts(GPIO_TypeDef* Addr, uint16_t data) { *Addr = data; }
__inline void ClearPorts(GPIO_TypeDef* Addr, uint16_t data) { *Addr = data; }

Then the code:

setPorts(GPIOA, 1);
clearPorts(GPIOA, 2);
clearPorts(GPIOA, 1);

will likely generate something like (I'm not positive about some details):

    ldr  r0,=GPIOA
    mov  r1,#1
    strh r1,[r0+20]
    mov  r2,#2
    strh r2,[r0+22]
    strh r1,[r0+22]

By contrast, if one uses GPIO_WriteBit the code would end up being more like:

    ldr  r0,=GPIOA
    mov  r1,#1
    mov  r2,#1
    bl   _GPIO_WriteBit
    ldr  r0,=GPIOA
    mov  r1,#2
    mov  r2,#0
    bl   _GPIO_WriteBit
    ldr  r0,=GPIOA
    mov  r1,#2
    mov  r2,#0
    bl   _GPIO_WriteBit

_GPIO_WriteBit: ; Code below assumes pretty good compiler optimization 
    cmp   r2,#0
    itteq
    streq r1,[r0+20]
    strne r1,[r0+22]
    bx    lr

The first example executes six instructions total for all three operations. The second example executes twelve in the main code once, and about four instructions in the WriteBits function three times each, for a total of 24 [skipped instructions sometimes add to execution time and sometimes not]. Normally the purpose of calling subroutines is to trade off code size for execution speed, but in this case the subroutine call is a disaster from the standpoints of both space and time. The sole useful work done by the instruction is a single store operation; the code will likely leave registers r0-r2 undisturbed, but the calling code will have no way of knowing that. Consequently, all three parameters must be explicitly set before each each method call.

I don't know why chip vendors define methods which write GPIO bits but don't bother to make them inline, but I would suggest that in most cases one should avoid using chip-vendor-supplied functions to write GPIO ports if one cares anything about efficiency.

Incidentally, as a general further note, I tend to be skeptical of vendor-supplied I/O functions in general. While they can sometimes present a programmer with a higher level of abstraction than the raw hardware, which is useful, in many cases they make code harder to write, harder to read, and less efficient. They can also sometimes have unwanted side-effects that may cause problems elsewhere. For example, if a peripheral needs a clock to be set to one of two modes, and generally works better with one of them, a vendor-supplied library for the peripheral might configure the clock to the "better" mode even if that clock is shared with another peripheral that needs it to be in the other one. When peripherals share resources (as they often do), it may be impossible to properly use the libraries without reading and understanding all the code therein. If the whole purpose of the library was to save a couple of register writes, the consequences of library call may be a lot harder to figure out than the consequences of the code it replaces.


I answered to your related question why you could only toggle at 4 MHz when you expected 100 MHz:

If the 4 MHz is about 4 MHz, and not exactly 100 MHz/25 then the problem is probably with the C function GPIO_WriteBit.

For high speed operations and operations which require accurate timing you better code in assembly than in C. If you look at the assembly code created by GPIO_WriteBit it may be half a page long, depending on what kind of features the function has, and how much the compiler's optimizer can do with it.

You don't say which development toolchain you're using, but many/most C compilers can handle in-line assembly.

So, write the functions in assembly. A function like

 GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;

should take no more than 2 instructions in assembly, while the compiled C code may take 20 times as much. Or more.


The speed you set only controls the slew rate of the pin. The faster it is, the faster the rising edge. It does not directly represent how fast you can toggle the ports.

The feature allows you to appropriately interface with other devices that require specific rise/fall times.

Tags:

Stm32