Saturday, 11 April 2020

Project Stage - 2

This blog is about the stage 2 of our project ( Project Stage 1 - https://asharma206-spo600.blogspot.com/2020/04/project-stage-1.html). So after we were done setting the benchmarks, next step was to do the profiling of the software. In other words we had to figure out what part of the software was taking the maximum execution time.

We were introduced to two ways on how to figure out what function in our software was taking the maximum execution time. The two concepts were of sampling and instrumentation.

1. Sampling: In this way of profiling, the program is interrupted thousands of times per second, thus collecting thousands of samples per second, and we can get information about what the program counter was doing during the interruption and thus we can figure out what called what and hence what was getting executed for the majority of the time.

2.  Instrumentation: Another way of profiling is instrumentation. In this, some code is added during building of the software, so that all the function calls can be recorded and thus we can figure out what part takes the maximum execution.

Two tools that can be used to do the profiling. gprof and perf. I will demonstrate how to work with each of these and then the results I got from each of them.

gprof:
In order to use this tool I had to setup the build to include a -pg (profile generation) option. Next step was to build the software. Next we would run the software as usual. We would see a new executable called gmon.out generated. It would be a binary file. Next step was the usage of gprof command:
filename: (name of the executable originally used to run the software (not gmon.out))
gprof filename | gprof2dot | dot -T x11

Now gprof filename would originally would give the status of different functions and the execution times and other related info. But in order to get a more readable and helpful info we would feed the result to gprof2dot which would further be fed to a program called dot which would give a textual form of graph for the execution tree but for a more graphical version we give -T x11 options which would launch a new window with the visual version of the tree.

result:


Since the original tree was much more complex, this is just a screenshot of the original tree containing the function that takes the maximum execution time.
As it is clear from this graph the function CompressFragment was taking more than 74% of the execution time and though the control was getting transferred to multiple functions, the time taken by them was not that significant. Just a note, this tool uses both instrumentation and sampling



perf:
Though I already knew by now what function was taking the maximum time, but since it was suggested by our professor to use both the tools, I decided to work with perf as well. Now, to use this tool we do not need to use any special options and so I just used the -O2 optimization level and rebuild the software.
Note: This tool only uses the sampling and not the instrumentation.
In order to use this tool, we would just issue the program:
perf record ( program execution)
This would record the samples and would create a file called perf.data.
Then we would issue a command: perf report
Since this tool does not use instrumentation we can't produce an execution tree, we would just see a status of different functions and their execution times.

result:



As expected it is also clear that the most execution time in this program is spent on function CompressFragment (72%). Though the execution time is a bit different from the one in gprof but it is understandable since both of them use different approaches.


So by this time I was clear it was CompressFragment function in this software that I had to work on and try to optimize and since it is taking more than 70%, even a small change in its execution time would mean a good amount of difference in its overall execution time. I would be spending the next few days figuring out what aspect in the function I can work on and I hope to make some good progress on this project overall.

No comments:

Post a Comment