As technology advances, there are two basic processing platforms for implementing embedded systems. The first is the Microcontroller Unit (MCU). These devices have varying amounts of integrated Flash (<= 2MB) and RAM (<= 1MB), and are designed to run bare-metal code or a real-time operating system (RTOS), like FreeRTOS. The second is the Linux-capable Microprocessor Unit (MPU). An example of an MCU based system is most Arduinos, and an example of an MPU based system is the Raspberry PI. An MPU typically does not have embedded Flash and RAM — at least on the same die. The fundamental difference between MCU/RTOS and MPU/Linux systems is the memory architecture and the amount of memory in the system.
There are other differences as well, summarized in the table below.
Feature | MCU | MPU |
# power supplies | 3.3V | VCORE, VIO, VDDR, etc. |
USB Host | limited | Full support many drivers |
Networking | limited | Options for GB and multiple interfaces |
Cost | lower $1-$20 | higher, starts at $10 (cpu + power/mem) |
Programming Languages | C/C++ Micropython Rust | C/C++, Go, Python, Nodejs, Java, Erlang, Rust, about anything |
Realtime | hard | soft |
Data processing | limited | excellent |
Processing power | less | more |
Display size | small | any size |
Startup time | fast | slow |
Expandable over time | less so | more so |
On-chip Peripherals | more | less |
PCB real-estate | less | more (external flash/RAM) |
Performance is related to memory architecture. In most MCUs, the memory architecture is fairly simple. Code is executed directly from flash, and on-chip SRAM is accessed directly. Programs do not need to be loaded from flash into RAM before running them. This architecture is very simple, leading to predictable real-time response and code execution. It takes a consistent amount of time to load code from flash into the processor for execution, so the timing for every instruction is fairly predictable. However, with this architecture, clock rate is typically limited to around 190MHz (example STM32F4). Some of the newer STM32 parts, such as the STM32F7 and STM32H7, add a L1 cache, which allows the CPU to run at 216MHz and 400MHz respectively.
MPUs have a more complex memory architecture in that they page code from Flash into SDRAM (both external to the MPU), and then from SDRAM into two or more levels of cache memory located on the MPU. The local cache memory is very fast, allowing these processors to run at high clock rates (1GHz or more in some cases). A memory management unit (MMU) implements a virtual memory system in which physical pages from RAM are mapped into a virtual address space. This is very efficient as physical pages are only mapped in as needed and can be discarded if RAM is needed elsewhere. But this memory management introduces delays that are relatively long for some hard real-time systems. The first access to a block of code that has not been run recently takes time, as code needs to be paged from a file in flash into SDRAM and then loaded into the respective caches. But subsequent accesses from cache are blazing fast. An MMU also provides protection against one user space process corrupting memory of another process, or corrupting kernel memory. For complex systems (server/desktop) running multiple processes, this protection increases the system reliability. For many Embedded systems, there is only one main application, and protecting processes from each other is less of a concern. However, this memory protection is still useful in that a bug in the application does not crash the system, and recovery mechanisms can be built in to restart the app, or debug it during development.
In an MCU, program flow is controlled through simple loops and state machines (no operating system) or a real-time operating system (RTOS). In an MPU system, a fairly complex operating system like Linux is typically required to manage the multiple levels of memory and storage, schedule the multiple processes that are running, and provide drivers for the complex hardware systems found in an MPU (USB, large displays, networking, etc). There is a great gulf between these two systems. Even though a RTOS may run on an MPU, there are much longer delays in execution due to the memory architecture. You still have to load code from flash to SDRAM and then through several layers of cache memory before the CPU can execute it. Once it is in cache, the code executes much faster and can do more work overall, but the occasional delays are still there. You might be able to play tricks by locking lines of code in cache or by using the small amount of on chip SRAM that may be present, but by the time you do this, you are now back to the smaller memory sizes on an MCU, and may as well just use an MCU.
After understanding the differences in memory architectures, we can understand that MCUs and MPUs are optimized for different things. An MCU is optimized for simplicity, cost, and predictable (real-time) response times. An MPU is optimized for getting the maximum amount of work done over time. Cost and real-time response is a secondary concern. There is not really a lot of middle ground between an MCU and MPU, evidenced by the large gap in memory sizes. You can’t fit standard Linux in an MCU, as there is only so much NOR Flash and SRAM that will fit on an MCU die. Once you move to more dense memory technologies such as NAND Flash and SDRAM, these must be implemented as separate dies and typically separate integrated circuit packages. These technologies are so efficient at storing large amounts of data, that you see a huge jump in memory capacity once you switch to the off-chip memories.
MCUs and MPUs also have different origins. Today’s 32-bit MCUs have descended from simpler 8-bit MCUs and are scaled-up embedded controller technologies. MPUs have descended from desktop and server computer systems and are scaled-down technologies. The differing priorities of these two paradigms reflect their different uses.
Keeping these differences in mind, the most fundamental question to ask in selecting an MCU or an MPU is whether hard real-time performance and reliability is most critical (control-centric application), or whether data processing performance and connectivity is most important (data-centric application). If both of these are important in your application, you should consider having both an MCU and MPU in the system, saving yourself untold pain.
Reliability is an interesting topic. I have seen MPU/Linux systems operate very reliably, and have also experienced issues that were very difficult to solve. MPU systems are many times more complex and have more unknowns, which translates into more risk. There are many more physical components and solder connections required to implement the system, all of which can fail over time with environmental stress. There are many millions of lines of code in an MPU system that you did not write, but you are still responsible for all of it — it all has to work for the system to function. Although an MPU system can be made fairly reliable, simple statistics tell us that MCUs will generally be more reliable than an MPU because there is less hardware that can fail and less lines of code running that may contain bugs.
One example of a problem we experienced when implementing control in an MPU system is a product where we were collecting data, and the data was being transferred back to a cloud server over a cellular modem. Unfortunately, the cellular modem in the system was not 100% reliable, and the only way to recover in some instances was to reboot the system when we detected a network failure, which cycled power to the modem (this modem module itself ran Linux). At one point, we added some basic control functionality to the system to control plant blowers based on a schedule. The control worked fine until the system had to restart due to modem connectivity issues, then the blower control was inactive during the reboot cycle (perhaps 30s or so). This was not a fatal problem in this application but also not ideal. The obvious solution is to get a more reliable modem or develop a better recovery method, but with units in the field and other development priorities, it is not always so simple. This is a classic example of the tension between complex data/connectivity systems and reliable control.
In an MCU system, the hardware system is relatively simple, and you are using a relatively small RTOS, or none at all. You write a greater percentage of the code in the system yourself, so if you have a simple task to do, and write reliable code, there is a potential to have a more reliable system than a comparable MPU system. However, if you are trying to do complex data processing and connectivity tasks on an MCU (such as writing your own database, network, or USB stack), chances are there will be bugs in your code and you will have a less reliable system than if you would have chosen an MPU using proven technologies. If your application is large and complex, then writing it in a safer language like Go or Rust on an MPU may provide a more reliable and maintainable product than trying to implement the same functionality in C++. Again, if you need both reliable real-time control and advanced data processing, then put both an MPU and MCU in the system.
Another advantage of an MPU system is they tend to be more general purpose and more functionality can be added over time. There is little danger of running out of code space, which often happens in MCU environments. With interfaces like USB, additional peripherals can be added in the future as requirements change, and with Linux, drivers for a vast array of hardware are available.
In some cases these lines between MCUs and MPUs are blurred by technologies like Linux RT extensions, or uCLinux, but you still must keep the above principles in mind when selecting the building blocks for your product. MCUs and MPUs are very different devices designed to handle different tasks. Before choosing, you should understand what you are trying to do, and what is important for your product.
Reference: