Profile your Python script at transaction level with minimum overhead

Tian Gao
5 min readNov 8, 2020

Profiling is a critical step when we try to optimize our code. There are a lot of profiling tools in the market. However, most of them share a common shortcoming — They provide too much information.

Don’t get me wrong, I’m not saying information is a bad thing. Sometimes we do need details to figure out why our code is slow, but for larger production level project, we are often more interested at “transaction level” profiling.

So, what is “transaction level” profiling? We should discuss “function level” profiling first, which is a more familiar concept for programmers. Take Python’s built in profiler — cProfile as an example. The profiling unit is “function”. cProfile will generate call count/time consumed information of each function. This is useful when we try to locate the bottleneck of a CPU-bound script. However, it suffers from relatively large overhead, which causes skewed result, and it’s not intuitively clear because we have too much to read.

It would be nice, if we could focus on the “important” functions, for instance, the API function to http requests, or the entry function for a very important feature. This will give us a big picture about the time consumed of each “job”, or “transaction”. The obvious advantage of doing it is we saved all the overhead of logging/recording the “nonsignificant” functions(if you implement correctly, of course), which makes the result reflects the reality better. The other benefit is simply a “clearer” result, especially if we could visualize it.

VizTracer now has a way to do this.

VizTracer can always log every single function and visualize them using trace-viewer.

result of viztracer

This is detailed information of a MCTS running. You can zoom in for details of the execution of any single function. Like I said before, this is very helpful in a lot of cases for profiling and debugging.

However, sometimes this method does not work perfectly. For example, when we are using multiprocessing pool. The built-in library needs to take care of the underneath logic so we are getting a lot of extra information that is not related to our work. Let’s grab a piece of code from Python official website as an example:

from multiprocessing import Process, Pool
import os
import time
def f(x):
return x*x
if __name__ == "__main__":
process_num = 5
with Pool(processes=process_num) as pool:
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10)))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print(i)
# evaluate "f(20)" asynchronously
res = pool.apply_async(f, (20,)) # runs in *only* one process
print(res.get(timeout=1)) # prints "400"
# evaluate "os.getpid()" asynchronously
res = pool.apply_async(os.getpid, ()) # runs in *only* one process
print(res.get(timeout=1)) # prints the PID of that process
# launching multiple evaluations asynchronously *may* use more processes
multiple_results = [pool.apply_async(os.getpid, ()) for i in range(process_num)]
print([res.get(timeout=1) for res in multiple_results])
print("For the moment, the pool remains available for more work") # exiting the 'with'-block has stopped the pool
print("Now the pool is closed and no longer available")
multi process result, every thread is collapsed to show the whole picture

The only function I’m interested in is f, of course in real projects that’s a much complicated function. However, it’s not easy to locate where it is executed because we have too much information.

So, we can use the log sparse feature of VizTracer. We need to add @log_sparse decorator to the function we are interested in. And run VizTracer with --log_sparse .

from viztracer import log_sparse
from multiprocessing import Process, Pool
import os
import time
@log_sparse
def f(x):
return x*x
if __name__ == "__main__":
process_num = 5
with Pool(processes=process_num) as pool:
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10)))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print(i)
# evaluate "f(20)" asynchronously
res = pool.apply_async(f, (20,)) # runs in *only* one process
print(res.get(timeout=1)) # prints "400"
# evaluate "os.getpid()" asynchronously
res = pool.apply_async(os.getpid, ()) # runs in *only* one process
print(res.get(timeout=1)) # prints the PID of that process
# launching multiple evaluations asynchronously *may* use more processes
multiple_results = [pool.apply_async(os.getpid, ()) for i in range(process_num)]
print([res.get(timeout=1) for res in multiple_results])
print("For the moment, the pool remains available for more work") # exiting the 'with'-block has stopped the pool
print("Now the pool is closed and no longer available")
viztracer result with log sparse

Now we only have the records of f(). It’s expected that the actual execution of f() is not significant because it’s a very simple function and the overhead of multithreading pool is not small. We can change f() to a more time-consuming function:

def f(x):
ret = 0
for i in range(x*400):
ret += i * x
return ret

And the result will be more interesting:

viztracer result with large f

You can easily observer the concurrency of a multiprocessing pool.

multiprocessing pool is not the only case when this sparse log feature comes into hand. When we run functions in asyncio, or web applications, or any other program when our code runs on an existing complicated infrastructure, VizTracer with --log_sparse feature can let you focus on your code of interest, instead of wasting time on filtering out the information that you don’t care.

If you got interested in VizTracer, just be aware that it has far more features and abilities than sparse log. Check https://github.com/gaogaotiantian/viztracer for the tool and start using it for your project!

--

--