Multithreading on AVR

To create an adequate multi-threading system, you need to do a 'context switch' see Wikipedia 'context switch' for an explanation.

The code needs to make the 'context switch' be 'invisible' to each thread. Otherwise a thread will not be possible to restart reliably, destroying the value of doing it.

To make an invisible context switch to any running thread all of its state needs to be saved. It isn't sufficient to just save the SREG (status register), and flip to another stack. It needs to save all registers that could possibly be in use. Clearly the obvious set is the entire AVR register file, 32 registers. They would need to be saved, and the register file for the thread about to be resumed loaded in such a way that everything will be restored to the same state at the point that thread was previously interrupted.

Summary: yes an AVR multi-threaded system could be built using a timer, however, your code needs to save and restore a lot more state.

As I said in my comment, an AVR running at its highest speed of 20MHz, needs one or more cycles to complete one instruction. It would take about 32 instructions, and hence 64 cycles to store all of the 32 registers into memory, and about 64 cycles more to load the registers for a different thread, plus a little bit more for IRS entry and exit.

So, you should estimate about 140 cycles to create the most basic multi-threading context switch. That is 7µs at 20MHz clock. I'd suggest you keep the context switch well under 10% of the available CPU cycles so that it can actually get some useful work done. So have the interrupt less frequent than 70µs.

A possible place to help you understand the scope of the issues is to look at an existing simple OS. Then you will understand all of the other pieces which are likely to be needed more quickly. For example something like FreeRTOS.

This 'Arduino OS' forum page may help too.

A lot of the context switch code will need to be in assembler because you can't reliably get at the registers from C. However, because the AVR's instruction set is relatively simple, that shouldn't be too hard.

Each thread is usually represented by several different chunks of memory (RAM): 1. The register values 2. The stack

Atmel's AVR don't have a lot of RAM, so one awkward part is allocating enough stack for the thread, without using so much memory that there isn't room for enough threads.

The last little wrinkle that I remember is an exit from ISR (RETI) isn't exactly the same as a return from subroutine (RET), so you'll need to think that through too.


Yes, with qualifications.

Simply changing the stack pointer won't be sufficient to switch threads; you will also need to store the values of all 32 registers at the time your ISR is called from a given thread, and restore the original values when switching back to that thread. Between this and the need to store and modify the stack pointer, this will all mean that the ISR must be written in assembly.

You'll also need to allocate a separate stack for each thread. The AVR default stack starts at the top of memory and extends downward from there; you will need to allocate a separate stack somewhere else in memory for any secondary threads.

Between all of this, you will find that writing (and, worse, debugging!) this functionality will be quite difficult. I'd strongly advise you to find another way to structure your program if at all possible.


When the question was originally posted the interrupt rate was specified as 100nsec. So I responded:

At 100 nsec interrupt rate that corresponds to a frequency of 10 MHz. If you had an AVR with a 100 MHz clock rate (and that is way higher than most parts in Atmel's AVR family) that would result in there being only 10 clocks available to respond to and handle each interrupt. Keeping in mind that most of the AVRs are slower than 100 MHz then even less clocks are available to process each interrupt.

I can suggest that five or ten clocks cycles will never be enough to process any kind of real world type interrupt service routine. For this type of MCU I strongly suggest that you back the period of your periodic interrupt input down to something more on the order of 1 to 10 milliseconds.

Now that the OP has changed to 100usec rate here is my response:

At a 100 usec interrupt rate that corresponds to a frequency of 10 KHz. If your AVR is one with a typical operating frequency of 20 MHz this will correspond to:

20 MHz / 10 KHz = 2000 clock cycles per interrupt. At this rate it is possible to support interrupt service routines that have a carefully managed amount of code within the interrupt context.

At to whether it is plausible to create an RTOS with full context switch using an interrupt rate like this is something that you will have to calculate. For example if you write code to do the context switch at each interrupt time and find that the code takes 200 clock cycles to complete then it is easy to see that 10% of the MCU's compute bandwidth is consumed just with performing overhead work before a given task can run.

I have personally coded dozens of embedded MCU applications that have never even come close to needing/requiring an RTOS. Some of these embedded applications were projects that had the appearance of 30, 40 or even 50 things all happening at the same time. Here are some of the reasons I stay away from RTOS applications:

  1. Context switching overhead is expensive in terms of overall overhead if you try to code for sub millisecond task latency.
  2. In limited resource MCUs like AVRs there is only so much RAM available. Using up a good chunk of it to store task contexts and task control blocks may be a penalty that is too high for many applications.
  3. RTOS Task environment requires also that there be memory pool management between tasks. If it gets managed as a heap with dynamic allocation there is additional processing overhead to perform the heap management. On the other hand if you select the simpler approach and decide on a fixed allocation of memory per task then you place arbitrary limits on maximum memory available to a given task and some tasks with small memory footprint will be sitting on unused memory.
  4. RTOS usage almost always pushes a given application up at least one or two notches of processor family capabilities and resources which increases BOM cost.
  5. For "roll your own" RTOS usage there is always going to be an extra burden placed on your project to maintain the code, fix bugs and fine tune performance issues that detract from building the actual product that should get 100% attention.
  6. If you use a third party RTOS there is extra work added to the product development to learn how to use it, there may be extra costs involved if it is a commercial product and you may still be faced with bugs and performance issues.

I often think that the only folks that really make money or reap tangible benefits from an RTOS are those that make the RTOS products and convince other engineers that they are needed in order to get real time work out of an MCU.