Code Optimization in Microcontrollers
A microcontroller's C-language code may require optimization in certain advanced applications. This code optimization is practiced in order to reduce two crucial things:
Microcontrollers can store limited data and instructions because of limited size of their RAM. Therefore the code needs to be optimized, so that the available instruction and data memory can be utilized in most efficient way.
Code Execution Times.
Microcontrollers are sequential devices which execute one instruction at a time. Each assembly instruction consumes a certain number of clock cycles to execute itself. Therefore the code must be optimized to ensure that it performs the required task in least number of clock cycles or assembly instructions. The less clock cycles a code uses, the faster it runs. This means that application can run faster because processing times are minimized.
This article presents tips and tricks which may be employed to reduce the size and execution time of a micro-controller code.
Microchip's MplabX development IDE will be used to demonstrate examples where appropriate.
How to Measure Code Execution Time Experimentally
To get any idea of how much time your code actually takes to execute in real-time, you need to measure it experimentally. A logic analyzer can be conveniently used to measure code execution time and those interested can inquire the process for this from me on e-mail. Beside this:
- Some compilers have the ability to count clock cycles a code will consume.
- Some debuggers for example the ICD 3 from microchip can directly measure execution time through a stopwatch.
1. Know the Processing Power and Memory Size of your Microcontroller
It is not always the clock frequency (Mhz) which gives the true picture of the processing speed of a micro-controller, a more realistic measure is MIPS (mega instructions per second) or the number of instructions MCU can execute in a second.
MCUs usually range from 60-70 MIPS in the high-end category to 20 MIPS 8-bit AVRs. A high MIPS micro-controller is likely to be more expensive then a low-end device so here you have a trade-off between cost and processing speed.
Micro-controllers have separate memory for storing data and program code. Size of both of them can be found from the datasheet. You may need a MCU with bigger memory size if your code is substantially large.
2. Choice of Variables for Optimization in Code Size
Micro-controllers have limited data memory, usually ranging from 1 to 4 Kbytes. In this case it is wise to choose the most appropriate variable type according to the expected range of the date being stored. The table below summarizes these variables:
Size in Bytes
0 or 1 only
-128 to 127
-32,768 to 32,767
0 to 65,535
-2,147,483,648 to 2,147,483,647
Precise up to 6 decimal places
Precise up to 15 decimal places
Precise up to 19 decimal places
- If two variables X and Y are to be added and the result is to be stored in Z but the value of Z is expected to be higher then 65,535 after addition then Z may be declared as a long and X and Y may be declared as unsigned int, values of X and Y are also not expected to go negative. This will save 04 bytes in the data memory that would otherwise have been used up if all variables were to be declared as long.
- Two variables X and Y, whose values are expected to be in whole numbers are to be divided, but the result of division may yield a decimal, then X and Y may be declared int and the result may be declared a float or double depending on the precision required.
Choice of data type may be crucial when declaring arrays containing large number of elements.
3. Choice of Variables for Optimization in Code Execution Time
- It is an established fact that floating point calculations take longer then fixed point calculations. Do not use a floating point variable where a decimal value is not required. Work with unsigned integers wherever possible.
- Local variables are preferred to global variables. If a variable is used in a function only then it must be declared in that function because accessing global variables is slower then local variables.
- An 8-bit MCU will find a single byte-sized variable faster to access and a 16-bit MCU will find a 2-byte variable easier to access due to the length of address generated.
4. Optimizing Arithmetic Operations
Arithmetic operations can be optimized in the following ways.
- Use look-up tables of pre-calculated values instead of evaluating a Sine or any other trigonometric function or any other operation whose result can be known beforehand in the code.
- In case a sine look-up table is already stored in the memory a cosine may be evaluated by advancing the array pointer equivalent to 90 degrees.
- Among the four arithmetic operations, division and multiplication take the most processing time, in practice it can be in the range of hundreds of micro-seconds or so in case of floating point values.
- Use bit shift instructions instead of division and multiplication. A right shift instruction >>3 serves to divide by 23 where as a left shift instruction <<1 will serve to multiply by 21.
5. Use a DSP Capable Microcontroller for Intensive Calculations
Some micro-controllers have a DSP processing unit other then the conventional ALU built into their architecture. This DSP engine is geared to perform arithmetic calculations very quickly in the least number of clock cycles (one in most cases) many times faster then the ALU.
Instructions a DSP processor can carry out faster then an ALU are:
- Bit shift and rotate instructions.
- Multiplications, Divisions and other arithmetic operations.
- Evaluating Sines and other trigonometric functions.
- All DSP operations such as FFT, DFT, convolution and FIR filtering.
Using the DSP engine of a microcontroller requires that:
- Separate DSP libraries are incorporated into the project.
- Names of functions are different from standard math library of C-language. Documentation of these libraries and functions can be availed from the respective manufacturers website.
- DSP engine utilize a different variable type 'fractional'. Learn how to use fractional type variables before proceeding with dsp library functions.
Note that standard math library functions will not invoke the DSP engine because they get translated into ALU assembly instructions.
6. Work with Interrupts
Use interrupts for performing specific functions such as:
- Reading ADC values.
- Sending and receiving from UART.
- Updating PWM duty cycle registers.
- CAN or I2C communication.
Interrupts will service these functions quickly as compared to performing them in the main body by way of a function call or inline code.
Interrupts will also trigger only when required, whereas if coded in the main body, the code will execute in every iteration of the while(1) loop.
7. Use the Best Available Compilers
Compilers can automatically implement some of the optimizations discussed above while translating the code from C-language to assembly language if properly configured. Look for optimizing options in your compiler and if possible upgrade to professional versions of compilers because they are more powerful code optimizers.
8. Use Conditional Statements Intelligently
- When using a series of if-else statements keep the most probable condition first. This way the MCU will not have to scan through all of the conditions after it finds the true condition.
- A switch-case statement is usually faster the an if-else.
- Use nested if-else statements in place of a series of statements. An if-else block having many statements may be divided into smaller sub-branches to optimize for worst case (last) condition.
9. Use Inline Functions
Functions that are to be used only once in the code may be declared as static. This will make the compiler optimize that function to an inline function and hence no assembly code will be translated for the function call.
- A function may be declared inline by using the keyword 'static' with it.
10. Use Decremented Loops
A decremented loop will generate less assembly code as compared to an incremented loop.
That is because in an increment loop, a comparison instruction is needed to compare the loop index with the maximum value in every loop to check if the loop index reaches the maximum value. On the contrary in a decrement loop, this comparison is not needed any more because the decremented result of the loop index will set the zero flag in SREG if it reaches zero.
Given that the loop has to iterate hundred times, reducing one instruction from the loop will avoid it being executed a hundred times so the impact is likely to be more significant when the loop has to iterate many times.
These tips may be helpful but their true application and potency depends on the skill of the programmer and the command he has on his code. Remember, size of the program doesn't always determines execution times, some instructions may consume more clock cycles then the other so once again skills of the program must play their part.
© 2017 StormsHalted