What is actually sent/loaded to a microcontroller / STM32?

That is a lot of questions...

So a simple, technically functional, STM32 program:

.thumb

.globl _start
_start:
    .word 0x20000100
    .word reset
    .word 0x12345678
    .word 0xAABBCCDD
.thumb_func
reset:
    nop
    nop
    nop
    b reset

Build it and then see what we see:

$ arm-none-eabi-as so.s -o so.o
$ arm-none-eabi-ld -Ttext=0x08000000 so.o -o so.elf
$ arm-none-eabi-objcopy so.elf -O srec so.srec
$ arm-none-eabi-objcopy so.elf -O ihex so.hex
$ arm-none-eabi-objdump -D so.elf > so.list

actually start with this one:

$ arm-none-eabi-objdump -D so.o 

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <_start>:
   0:   20000100    andcs   r0, r0, r0, lsl #2
   4:   00000000    andeq   r0, r0, r0
   8:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000
   c:   aabbccdd    bge feef3388 <reset+0xfeef3378>

00000010 <reset>:
  10:   46c0        nop         ; (mov r8, r8)
  12:   46c0        nop         ; (mov r8, r8)
  14:   46c0        nop         ; (mov r8, r8)
  16:   e7fb        b.n 10 <reset>

The choice to use the ELF file format is not arbitrary but in some sense it is, there are other file formats that could be used or could invent a new one, but elf is quite useful for many architectures. The GNU tools at least for ARM default to ELF. The object files are in ELF format as well as we see above.

The assembler has converted the assembly language into machine code the best it can. The .word reset line is not an instruction it is me asking for the address of the label reset to be placed there as that is a vector table that you need to boot the processor. The linker will fill in the externs and other gaps that the compiler doesn't know at compile time. So linked we can see the output in the so.list file I created:

Disassembly of section .text:

08000000 <_start>:
 8000000:   20000100    andcs   r0, r0, r0, lsl #2
 8000004:   08000011    stmdaeq r0, {r0, r4}
 8000008:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000
 800000c:   aabbccdd    bge 6ef3388 <_stack+0x6e73388>

08000010 <reset>:
 8000010:   46c0        nop         ; (mov r8, r8)
 8000012:   46c0        nop         ; (mov r8, r8)
 8000014:   46c0        nop         ; (mov r8, r8)
 8000016:   e7fb        b.n 8000010 <reset>

When you read the documentation from ST and ARM you find that the STM32 family is so far based on ARM Cortex-M cores, pretty much all of the flavors they make, they so far all boot the same way. The 32-bit value at address zero in the processors memory space is a value that they load into the stack pointer for you, nice feature, but I won't spend more time on it. The 32 bit word at address 0x00000004 in the processors memory space is the address where the reset handler is when the processor comes out of reset that address points at the code to run. Those are instructions, so machine code. The vector table is just vectors. For reasons I won't go into the LSB has to be a one so for the address 0x00000010 the vector is 0x00000011.

And you can see that the toolchain has done what I asked and put the machine code for those no-ops and the branch in there.

Now for the processor to do what we want we have to have all of these bytes in a place where the processor gets them when it fetches/reads those addresses.

When you read the documentation you find that for bootloader and perhaps other reasons they can/will remap what is presented to the processor as memory at address 0x00000000. When in the normal operating mode the flash that is at address 0x08000000 is mirrored at address 0x00000000. So if I put 0x08000011 at address 0x08000004 which is mirrored to 0x00000004. After reset the processor then sees that in the vector table and now fetches instructions from 0x08000010 and if we do everything right the processor will find our instructions and run them.

When you read the documentation for the chip you find that the flash is in the part and there are a couple-three ways to program the flash, to write bytes to it. One is a serial/UART deal, you wire the boot pin(s) to be high or low, reset the part, and it goes into the built-in bootloader that ST puts in there, not ours. Then there are formatted packets you communicate with the part and with that protocol you can ask it to write data to certain addresses in that flash, so the vector table and the machine code is what we would need to write:

 8000000:   20000100
 8000004:   08000011
 8000008:   12345678
 800000c:   aabbccdd
 8000010:   46c0    
 8000012:   46c0    
 8000014:   46c0    
 8000016:   e7fb    

These are hex numbers, the address on the left, data on the right. THAT is what we need to transfer.

Some of the STM32's have a USB bootloader, and all have a JTAG like thing, SWD, which is what you get with ST-LINK and such. On a lot of development boards you are actually talking to another microcontroller and that microcontroller is the one that uses SWD to talk to the MCU you are developing for.

Then you can wrap all kinds of host development IDE software around these interfaces for this part or other on board features, etc.

There are MANY files that qualify as "binary" files.

All of the ones I created above contain/describe the bytes and addresses we need to go into the flash. But as you saw above when I disassembled the program in the ELF file the labels _start, reset are in the output, how is that possible if that is not in any way used by the processor? Because these types of binaries have information like that for debugging or disassembling or other similar reasons.

hexdump -C so.elf
00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 28 00 01 00 00 00  00 00 00 08 34 00 00 00  |..(.........4...|
00000020  d0 01 01 00 00 02 00 05  34 00 20 00 01 00 28 00  |........4. ...(.|
00000030  06 00 05 00 01 00 00 00  00 00 01 00 00 00 00 08  |................|
00000040  00 00 00 08 18 00 00 00  18 00 00 00 05 00 00 00  |................|
00000050  00 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00010000  00 01 00 20 11 00 00 08  78 56 34 12 dd cc bb aa  |... ....xV4.....|
00010010  c0 46 c0 46 c0 46 fb e7  41 13 00 00 00 61 65 61  |.F.F.F..A....aea|
00010020  62 69 00 01 09 00 00 00  06 02 09 01 00 00 00 00  |bi..............|
00010030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00010040  00 00 00 08 00 00 00 00  03 00 01 00 00 00 00 00  |................|
00010050  00 00 00 00 00 00 00 00  03 00 02 00 01 00 00 00  |................|
00010060  00 00 00 00 00 00 00 00  04 00 f1 ff 06 00 00 00  |................|
00010070  11 00 00 08 00 00 00 00  02 00 01 00 0c 00 00 00  |................|
00010080  00 00 00 08 00 00 00 00  00 00 01 00 0f 00 00 00  |................|
00010090  10 00 00 08 00 00 00 00  00 00 01 00 21 00 00 00  |............!...|
000100a0  18 00 01 08 00 00 00 00  10 00 01 00 12 00 00 00  |................|
000100b0  18 00 01 08 00 00 00 00  10 00 01 00 20 00 00 00  |............ ...|
000100c0  18 00 01 08 00 00 00 00  10 00 01 00 59 00 00 00  |............Y...|
000100d0  00 00 00 08 00 00 00 00  10 00 01 00 2c 00 00 00  |............,...|
000100e0  18 00 01 08 00 00 00 00  10 00 01 00 38 00 00 00  |............8...|
000100f0  18 00 01 08 00 00 00 00  10 00 01 00 40 00 00 00  |............@...|
00010100  18 00 01 08 00 00 00 00  10 00 01 00 47 00 00 00  |............G...|
00010110  18 00 01 08 00 00 00 00  10 00 01 00 4c 00 00 00  |............L...|
00010120  00 00 08 00 00 00 00 00  10 00 01 00 53 00 00 00  |............S...|
00010130  18 00 01 08 00 00 00 00  10 00 01 00 00 73 6f 2e  |.............so.|
00010140  6f 00 72 65 73 65 74 00  24 64 00 24 74 00 5f 5f  |o.reset.$d.$t.__|
00010150  62 73 73 5f 73 74 61 72  74 5f 5f 00 5f 5f 62 73  |bss_start__.__bs|
00010160  73 5f 65 6e 64 5f 5f 00  5f 5f 62 73 73 5f 73 74  |s_end__.__bss_st|
00010170  61 72 74 00 5f 5f 65 6e  64 5f 5f 00 5f 65 64 61  |art.__end__._eda|
00010180  74 61 00 5f 65 6e 64 00  5f 73 74 61 63 6b 00 5f  |ta._end._stack._|
00010190  5f 64 61 74 61 5f 73 74  61 72 74 00 00 2e 73 79  |_data_start...sy|
000101a0  6d 74 61 62 00 2e 73 74  72 74 61 62 00 2e 73 68  |mtab..strtab..sh|
000101b0  73 74 72 74 61 62 00 2e  74 65 78 74 00 2e 41 52  |strtab..text..AR|
000101c0  4d 2e 61 74 74 72 69 62  75 74 65 73 00 00 00 00  |M.attributes....|
000101d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000101f0  00 00 00 00 00 00 00 00  1b 00 00 00 01 00 00 00  |................|
00010200  06 00 00 00 00 00 00 08  00 00 01 00 18 00 00 00  |................|
00010210  00 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|
00010220  21 00 00 00 03 00 00 70  00 00 00 00 00 00 00 00  |!......p........|
00010230  18 00 01 00 14 00 00 00  00 00 00 00 00 00 00 00  |................|
00010240  01 00 00 00 00 00 00 00  01 00 00 00 02 00 00 00  |................|
00010250  00 00 00 00 00 00 00 00  2c 00 01 00 10 01 00 00  |........,.......|
00010260  04 00 00 00 07 00 00 00  04 00 00 00 10 00 00 00  |................|
00010270  09 00 00 00 03 00 00 00  00 00 00 00 00 00 00 00  |................|
00010280  3c 01 01 00 60 00 00 00  00 00 00 00 00 00 00 00  |<...`...........|
00010290  01 00 00 00 00 00 00 00  11 00 00 00 03 00 00 00  |................|
000102a0  00 00 00 00 00 00 00 00  9c 01 01 00 31 00 00 00  |............1...|
000102b0  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|

You can see the strings _start and reset in the file. But if you look, you can see the 0x12345678 and 0xAABBCCDD values in there with the 0x08000010 and machine code before and after, put those in there to make it super easy to find.

00010000  00 01 00 20 11 00 00 08  78 56 34 12 dd cc bb aa  |... ....xV4.....|

Intel Hex and Motorola S-record are/were competing formats from back in the day still used by some tools, you could carry these files around and a ROM programmer would burn that into the ROM. They are ASCII files like these:

cat so.hex
:020000040800F2
:10000000000100201100000878563412DDCCBBAA94
:08001000C046C046C046FBE7F4
:0400000508000000EF
:00000001FF

cat so.srec
S00A0000736F2E7372656338
S31508000000000100201100000878563412DDCCBBAA86
S30D08000010C046C046C046FBE7E6
S70508000000F2

Tith the srecord the S3 lines describe the address and data that we need to go into the flash to run. The s0 and s7 lines are additional information that the processor does not need.

arm-none-eabi-objcopy so.elf -O binary so.bin
hexdump -C so.bin
00000000  00 01 00 20 11 00 00 08  78 56 34 12 dd cc bb aa  |... ....xV4.....|
00000010  c0 46 c0 46 c0 46 fb e7                           |.F.F.F..|
00000018

This form of binary file with this toolchain/tools is a memory image that needs to go into the processor, byte for byte. but the user has to know the address there is no debugging or other information like that the user has to know where this data goes.

These kinds of things are true for all processors, the details for each chip/core are specific to that core, how they start, if they have vector tables, the machine code itself, if the memory is in the part how to get the code in there, if outside the part descriptions of the busses so that you can interface the part to some memory or your logic so that when it reads/fetches the instructions at some address you provide the bytes from that address so that it works.

A very flexible file format like elf is such that you can use it to carry around the "binary" the program plus debug and other info, and then have tools like objcopy to convert it to file formats that other tools that don't support ELF might use. The elf file format you can google and is pretty simple, don't even need a library just read the file.

Not all MCUs have a bootloader built in, some only have one way in. Some folks develop their own bootloader that use other interfaces or the same ones in different ways, can/have created a bootloader for the STM32 that works over the UART using a different protocol than the one built in.

For the new fear of security "secure boot" the newer STM32's do not support the bootloader even though it's in the part, you have to unlock it before you can use it using another way in.

The STM32's only have a couple three ways in so all tools you find are coming in through one of those interfaces and/or the product has a bootloader and you are coming in through that (think AVR's on Arduinos).