How to determine the right microcontroller for your project?

What you call "up to speed" is usually called missing deadlines, and for some systems, called hard real-time systems, you have that those deadlines are critical (if you miss them someone might die or the system might be completely broken).

In your case (as I recall) your system was not even able to execute the sequence of instructions

Yi = 0.1441Ui + 0.2281Ui-1 + 0.1441Ui-2 + 0.6777Yi-1 - 0.254Yi-2 

once before getting the new data from the ADC (at the rate of 76 kHz).

First, determine how fast each of your tasks should run, in the case of that filter, does it have to run at 76 kHz? If so, write the best, "smallest in assembly", that does the job, and check which assembly instructions it use.

For each of those instructions \$i\$ you can find the usual time they take to run (the average \$ t_i\$) and the worst case scenario (never more than \$T_i\$ ). If your system is not a critical one, summing the number of each instruction, \$ N_i\$, and the time they take to run you get

$$ t_{total} = \sum_i N_i t_i,$$

$$ T_{total} = \sum_i N_i T_i.$$

Now, if you only had that task running, it would be enough that

$$ \frac{1}{T_{total}} < f_{task},$$

or if your system allows a few missed deadlines

$$ \frac{1}{t_{total}} < f_{task}.$$

But in reality, this approach is simplistic, and for system with many tasks you have to find all the needed time of all the tasks and make sure that they fit in your system, and even so, depending on how they run you might still miss a deadline by a few miliseconds or microseconds.

Rule of thumb

for a small project, calculate an estimate of \$ T_{total} \$ for the fastest (must be executed the most times per second) and more computationally demanding task, probably that filter of yours using float multiplication. Once you have that, find a microcontroller that can execute that task in less than \$\frac{1}{10}\$ of the time in between calls to that demanding task, meaning you will have \$\frac{9}{10}\$ of your time free to run other tasks.

Usually a ballpark of the number of instructions means that you can find which microcontroller will be able to run your system. And in your case, since you use floating-point numbers, having a microcontroller with a FPU.


Approach 1 (very coarse) - estimate the number of cycles your algorithm will take. I see that you have 5 floating-point multiplies and 4 additions. On a Cortex-M4F each of these operations take a single clock cycle. You will also need some cycles for overhead, interrupt entry/exit etc. Let's assume 10 cycles for the computation and 30 cycles for the overhead, so you need 40 cycles. Let's also assume a clock frequency of 20 MHz (quite low for an M4F). You have 20'000'000 clock cycles per second and 76'000 samples per second. If you divide these numbers you will get ~263 cycles per sample. You need 40, so 263 will be definitely enough. The system load will be around 15% - this is going to be okay. Best if you can look at the disassembly to count the instructions.

Side note: AVRs are not exactly the best MCUs for heavy number crunching.

Approach 2 - Needs a hardware timer. Develop your algorithm, start a timer, run the algorithm once (without the ADC and interrupts), stop the timer. If the timer is clocked at the same rate as the CPU you will get the approximate number of cycles the algorithm takes (approach 1 can be difficult if the algorithm gets more complex). Cortex-M3 and M4 have cycle counters for exactly that purpose. If you now see that the algorithm takes 300 cycles, while you get a new sample every 263 cycles, then you have a problem and can't process it in real time.

How to figure out which exact MCU (or CPU architecture) to choose? Write your algorithm in pure C and run on some eval boards. I keep a couple of cheapest Cortex-M0, M3 and M4F boards for that purpose, before I commit to a particular chip.