Low level code optimisation
Martin McBride, 2022-03-16
Categories python language intermediate python
This article looks at low-level code optimisation techniques. These techniques often involve changing individual lines of code to do the same thing in a slightly different way, with the aim of making the code run faster.
This type of optimisation should not be taken lightly, and should only be applied when strictly necessary. These techniques have several downsides:
- It takes longer to write highly optimised code.
- It often makes the code more difficult to understand.
- They need to be carefully maintained. A developer who doesn't understand the optimisations might accidentally undo them in a naive attempt to improve the code.
- When applied in the wrong situation, they might well make the code slower.
These optimisations should not be applied in every situation, nor should they be applied on a hunch that they might help performance. At best you might be wasting your time, at worst you will be making the code slower.
They should only be used when absolutely necessary.
1 Profile and experiment
If code is running slower than expected or required, it is important to find out which parts of the code are taking all the time. If the code is taking an hour to process some data, there is no point optimising a line of code that only gets called once right at the start and takes a few microseconds of CPU time.
There are several ways to identify the areas of code that might be problematic.
It is important to perform these steps using realistic data, and preferably multiple sets of data. Using unrepresentative data can sometimes give misleading results.
The first is to simply observe where the program gets stuck. This might be something very obvious, for example the program start to run slowly when it processes a particular type of file, or the program takes a long time to start up but runs fine after that. It might be something more subtle. These things might help you identify the areas to concentrate on.
The next stage might be to add logging statements at key points, for example the entry points to the main sections of code, and at the start of any loops you suspect might be slow. If you are not using a logger, then print statements are a good alternative. This is often an iterative process, starting with the top-level code then going into more detail in the parts of code that seem to be taking the most time.
Bear in mind that if some part of the code is creating lots of log messages (hundreds per second or more) this alone might slow the program down.
In many cases, you will find that the cause of the slowdown is either:
- A loop that executes many, many times and takes too long. You can try the techniques here, or if that fails it might require a more structural solution such as using a different algorithm or finding a library that performs the calculation in C code.
- The code spends time waiting for something, for example waiting for data from the network. You can investigate a non-blocking solution, or using multi-threading to allow the program to continue working until the data arrives.
The rest of this article is mainly about the first case, where your code is stuck in a loop and the solution is to make the loop run faster.
At this point, you might want to attempt some profiling. This allows you to find out exactly how slow your code is, so you can try different techniques and check which methods give the greatest improvement. This isn't always necessary, for example if the loop has an obvious problem then you can just go ahead and fix it.
If you do need to go down the profiling route, there are several methods:
- Adding a timestamp to the logging messages you already have is a quick and dirty method, but be aware that it can distort the timings if you have a high rate of messages.
timeitmodule is useful if you need to speed up a block of code that is fairly isolated and can be run independently of the main application.
timeitwill execute and time a code fragment.
cProfilemodule allows you to profile an entire program and will tell you how long the program spent in each function call.
Using these techniques, you should be able to identify the slow part of your code and evaluate different improvements to see which works best.
2 Exit early
Consider this code:
for x in k: stmt1 stmt2 if cond: break stmt3 stmt4
In this code, we loop over a list
k. In each cycle:
- We execute some initial code
- We check the condition
condand exit the current iteration early if it is true (using
- We then execute
stmt4, only of
For this code to be optimal, we need to ensure that the initial code only contains the things that have to be run every time. If there are things in the initial code that don't need to be run if
cond is true, then we are wasting time executing them. They should be placed after the if statement.
This might seem obvious, but it can easily be missed in complex code. In particular, if the initial code makes calls to other functions, you should check that those functions aren't doing unnecessary work.
You should also check the
if statement if it uses compound conditions:
if cond1 and cond2: ...
In this case, Python first evaluates
cond1 is false, Python won't evaluate
cond2 at all. It doesn't need to, because if the first condition is false the entire condition must be false. This is called short circuit evaluation.
This means that swapping
cond2 will potentially make a difference to how fast the code runs, and if the conditions involve calling a slow function it might make a big difference.
This means that:
- If one condition is quick to evaluate, and one is slower, do the quick one first, because then you might not need to do the slow one at all.
- If one condition is almost always false, do that one first, because then you won't usually need to evaluate the other one.
A similar situation exists for
or conditions, except that the
cond2 is not evaluated if
cond1 is true. As ever, it is worth testing to be sure the optimisation is worthwhile.
This optimisation is potentially fragile, in that a future developer might change the code without understanding the importance of the order of operations.
3 Use iterators rather than indexes for looping
When looping over a list or other sequence, we should always try to use the iterator form rather than the index form:
# Don't do this for i in in range(len(k)): print(k[i]) # Do this for x in k: print(x)
Apart from being shorter and more pythonic, the second form is faster. But there are a few situations where it might appear that we need to use indexing. We usually don't - Python has some built-in functions to help us.
The first is where we need to use the index within the loop for any reason. In this example, we will print the index along with the value:
# Don't do this for i in in range(len(k)): print(i, k[i]) # Do this for i, x in in enumerate(k): print(i, x)
enumerate function returns a sequence of tuples each containing the index and value of the next element. These values are unpacked into the variables
The other situation is where we need to loop over two (or more) lists together. In this example, we will print corresponding values from lists
# Don't do this for i in in range(len(p)): print(p[i], q[i]) # Do this for a, b in zip(p, q): print(a, b)
enumerate function returns a sequence of tuples each containing the next value from
p and the next value from
q. These values are unpacked into the variables
These will handle most cases, with no real disadvantages. There may still be cases where you have to loop over indexes, but they should be quite rare.
4 Reduce the number of function calls
Calling a function takes time. We don't (and shouldn't) usually worry about it, because it doesn't take that long, and using functions can make our code much more readable, maintainable and reusable. In general code, functions are all good.
However, in some types of loops, we need to consider the time taken to call a function. The time to execute a loop can be expressed, roughly, as:
total_time = number_of_iterations * time_per_iteration
Sometimes we will see loops that don't do much (the
time_per_iteration is very small), but if the
number_of_iterations is huge the loop can still take a long time. If the body of the loop makes function calls, that time might be significant.
A special case of this is if we are looping over a sequence, calling a function once per element:
def f1(element): do_stuff_with_element for x in long_list: f1(x)
We can reduce the number of function calls by putting the for loop inside the function:
def f1(list_of_elements): for element in list_of_elements: do_stuff_with_element f1(long_list)
The general rule here is to put loops inside functions rather than putting functions inside loops. Or put another way, we should try to pass aggregate data (lists etc) into functions.
A second technique is more general but also a little more drastic. Consider this code:
def f2(): do_stuff2 def f3(): do_stuff3 while cond: f2() f3()
We can remove the function calls by putting the code directly in the loop:
while cond: do_stuff2 do_stuff3
Of course, this means our loop has a blob of unstructured code rather than some nice clean function calls, but if it makes the difference between the code being fast enough or not, then it might be the only option.
As with all these examples, measure the resulting performance and decide if it is worth it, then add comments or other documentation to explain why the code is the way it is.
5 Cache attributes locally
There are various types of variables in Python, for example:
- Local variables, defined and used within a function.
- Global variables, defined outside of a function, and accessible from anywhere, even from other modules.
- Instance variables, defined in a class and accessible via objects of that class.
Accessing local variables is usually slightly faster, so if you have code that makes frequent access to other types of variables, it can be worth making local copies outside of the loop.
global_var = 0 def access_global(): for i in range(1000): calculation_with_global_var def cache_global(): x = global_var for i in range(1000): calculation_with_x global_var = x
The first function,
access_global does its work directly with
The second function,
global_var to a local variable
x, does the same work with
x, then copies the result back to
global_var. This will be slightly faster than
We can do something similar with methods of objects:
k = [1, 2, 3, 4] ki = k.insert ki(2, 'a') # Same as k.insert(2, 'a')
Here we have created
ki, a function that is equivalent to the
insert method of the object
k. This means that when we call the
ki function we don't need to specify that we wish to operate on the
k object, because the
ki function is already bound to the
These techniques can shave time off critical code, but they should only be used when necessary. Caching attributes creates code that can be confusing to read. And when aliasing variables, we must remember to copy the result back to the original variable at the end.
6 Avoid excessive abstraction in time-critical code
This is really a generalisation of the previous two items.
Deep levels of abstraction include:
- Deeply nested functions - functions that call functions that call functions.
- Deeply nested objects - objects that are members of other objects that are members of other objects.
- A combination of the above.
Abstraction is good. It is a key part of writing clean, maintainable code. Every level of abstraction will make the code slightly slower, but most of the time this doesn't really matter. If the code is waiting for the user to press a key, or for a byte to be read from disk, then taking a few nanoseconds longer to process that event will make no difference.
But in time-critical areas of the code, it may well be worth limiting the number of levels of abstraction, so that we don't need to do too many of the hacks we discussed earlier.
7 Multiple assignments
Python allows multiple assignments in one statement. For example here are two different ways to assign values to
# Individual assignments a = 1 b = 2 c = 2 # Equivalent multiple assignments a, b, c = 1, 2, 3
In most cases, this is a matter of style. However the second method might be slightly faster, so should be considered if the code needs to be optimised.
8 Lazy importing
This final item is more about latency than actual speed.
We normally import all the modules we need at the top of our Python file:
import numpy import matplotlib # Use numpy # Use matplotlib
This is normally good practice, as it lets us see at a glance which modules are being used.
However, you don't strictly need to import a module until the point where you actually use it:
import numpy # Use numpy import matplotlib # Use matplotlib
This won't make the code run any faster, but it might allow it to start up slightly faster because it doesn't need to load
matplotlib until it is required.
Incidentally, don't worry about importing the same module more than once. Python keeps track and will ignore any attempts to import a module that has already been imported.
We have looked at some low-level coding changes that can help speed up Python code in certain circumstances. They are worth trying if you have identified particular areas of your code that are running too slow. But all of these methods have downsides in terms of code clarity and maintainability, and they might even be counterproductive in some cases. They should be used carefully.