DragonFly On-Line Manual Pages
mprof-report(1) DragonFly General Commands Manual mprof-report(1)
The Mono log profiler
The Mono log profiler can be used to collect a lot of information about
a program running in the Mono runtime. This data can be used (both
while the process is running and later) to do analyses of the program
behaviour, determine resource usage, performance issues or even look
for particular execution patterns.
This is accomplished by logging the events provided by the Mono runtime
through the profiling interface and periodically writing them to a file
which can be later inspected with the command line mprof-report program
or with a GUI (not developed yet).
The events collected include (among others):
o method enter and leave
o object allocation
o garbage collection
o JIT compilation
o metadata loading
o lock contention
o exceptions
In addition, the profiler can periodically collect info about all the
objects present in the heap at the end of a garbage collection (this is
called heap shot and currently implemented only for the sgen garbage
collector). Another available profiler mode is the sampling or
statistical mode: periodically the program is sampled and the
information about what the program was busy with is saved. This allows
to get information about the program behaviour without degrading its
performance too much (usually less than 10%).
Basic profiler usage
The simpler way to use the profiler is the following:
mono --profile=log program.exe
At the end of the execution the file output.mlpd will be found in the
current directory. A summary report of the data can be printed by
running:
mprof-report output.mlpd
With this invocation a huge amount of data is collected about the
program execution and collecting and saving this data can significantly
slow down program execution. If saving the profiling data is not
needed, a report can be generated directly with:
mono --profile=log:report program.exe
If the information about allocations is not of interest, it can be
excluded:
mono --profile=log:noalloc program.exe
On the other hand, if method call timing is not important, while
allocations are, the needed info can be gathered with:
mono --profile=log:nocalls program.exe
You will still be able to inspect information about the sequence of
calls that lead to each allocation because at each object allocation a
stack trace is collected if full enter/leave information is not
available.
To periodically collect heap shots (and exclude method and allocation
events) use the following options (making sure you run with the sgen
garbage collector):
mono --gc=sgen --profile=log:heapshot program.exe
To perform a sampling profiler run, use the sample option:
mono --profile=log:sample program.exe
Profiler option documentation
By default the log profiler will gather all the events provided by the
Mono runtime and write them to a file named output.mlpd. When no
option is specified, it is equivalent to using:
--profile=log:calls,alloc,output=output.mlpd,maxframes=8,calldepth=100
The following options can be used to modify this default behaviour.
Each option is separated from the next by a , character, with no spaces
and all the options are included after the log: profile module
specifier.
o help: display concise help info about each available option
o [no]alloc: noalloc disables collecting object allocation info, alloc
enables it if it was disabled by another option like heapshot.
o [no]calls: nocalls disables collecting method enter and leave events.
When this option is used at each object allocation and at some other
events (like lock contentions and exception throws) a stack trace is
collected by default. See the maxframes option to control this
behaviour. calls enables method enter/leave events if they were
disabled by another option like heapshot.
o heapshot[=MODE]: collect heap shot data at each major collection.
The frequency of the heap shots can be changed with the MODE
parameter. When this option is used allocation events and method
enter/leave events are not recorded by default: if they are needed,
they need to be enabled explicitly. The optional parameter MODE can
modify the default heap shot frequency. heapshot can be used
multiple times with different modes: in that case a heap shot is
taken if either of the conditions are met. MODE can be one of:
o NUMms: perform a heap shot if at least NUM milliseconds passed
since the last one.
o NUMgc: perform a heap shot every NUM major garbage collections
o ondemand: perform a heap shot when such a command is sent to the
control port
o sample[=TYPE[/FREQ]]: collect statistical samples of the program
behaviour. The default is to collect a 1000 times per second the
instruction pointer. This is equivalent to the value "cycles/1000"
for TYPE. On some systems, like with recent Linux kernels, it is
possible to cause the sampling to happen for other events provided by
the performance counters of the cpu. In this case, TYPE can be one
of:
o cycles: processor cycles
o instr: executed instructions
o cacherefs: cache references
o cachemiss: cache misses
o branches: executed branches
o branchmiss: mispredicted branches
o time=TIMER: use the TIMER timestamp mode. TIMER can have the
following values:
o fast: a usually faster but possibly more inaccurate timer
o maxframes=NUM: when a stack trace needs to be performed, collect NUM
frames at the most. The default is 8.
o calldepth=NUM: ignore method enter/leave events when the call chain
depth is bigger than NUM.
o zip: automatically compress the output data in gzip format.
o output=OUTSPEC: instead of writing the profiling data to the
output.mlpd file, substitute %p in OUTSPEC with the current process
id and %t with the current date and time, then do according to
OUTSPEC:
o if OUTSPEC begins with a | character, execute the rest as a program
and feed the data to its standard input
o if OUTSPEC begins with a - character, use the rest of OUTSPEC as
the filename, but force overwrite any existing file by that name
o otherwise write the data the the named file: note that is a file by
that name already exists, a warning is issued and profiling is
disabled.
o report: the profiling data is sent to mprof-report, which will print
a summary report. This is equivalent to the option: output=mprof-
report -. If the output option is specified as well, the report will
be written to the output file instead of the console.
o port=PORT: specify the tcp/ip port to use for the listening command
server. Currently not available for windows. This server is started
for example when heapshot=ondemand is used: it will read commands
line by line. The following commands are available:
o heapshot: perform a heapshot as soon as possible
o counters: sample counters values every 1 second. This allow a really
lightweight way to have insight in some of the runtime key metrics.
Counters displayed in non verbose mode are : Methods from AOT,
Methods JITted using mono JIT, Methods JITted using LLVM, Total time
spent JITting (sec), User Time, System Time, Total Time, Working Set,
Private Bytes, Virtual Bytes, Page Faults and CPU Load Average (1min,
5min and 15min).
Analyzing the profile data
Currently there is a command line program (mprof-report) to analyze the
data produced by the profiler. This is ran automatically when the
report profiler option is used. Simply run:
mprof-report output.mlpd
to see a summary report of the data included in the file.
Trace information for events
Often it is important for some events, like allocations, lock
contention and exception throws to know where they happened. Or we may
want to see what sequence of calls leads to a particular method
invocation. To see this info invoke mprof-report as follows:
mprof-report --traces output.mlpd
The maximum number of methods in each stack trace can be specified with
the --maxframes=NUM option:
mprof-report --traces --maxframes=4 output.mlpd
The stack trace info will be available if method enter/leave events
have been recorded or if stack trace collection wasn't explicitly
disabled with the maxframes=0 profiler option. Note that the profiler
will collect up to 8 frames by default at specific events when the
nocalls option is used, so in that case, if more stack frames are
required in mprof-report, a bigger value for maxframes when profiling
must be used, too.
The --traces option also controls the reverse reference feature in the
heapshot report: for each class it reports how many references to
objects of that class come from other classes.
Sort order for methods and allocations
When a list of methods is printed the default sort order is based on
the total time spent in the method. This time is wall clock time (that
is, it includes the time spent, for example, in a sleep call, even if
actual cpu time would be basically 0). Also, if the method has been
ran on different threads, the time will be a sum of the time used in
each thread.
To change the sort order, use the option:
--method-sort=MODE
where MODE can be:
o self: amount of time spent in the method itself and not in its
callees
o calls: the number of method invocations
o total: the total time spent in the method.
Object allocation lists are sorted by default depending on the total
amount of bytes used by each type.
To change the sort order of object allocations, use the option:
--alloc-sort=MODE
where MODE can be:
o count: the number of allocated objects of the given type
o bytes: the total number of bytes used by objects of the given type
To change the sort order of counters, use the option:
--counters-sort=MODE
where MODE can be:
o time: sort values by time then category
o category: sort values by category then time
Selecting what data to report
The profiler by default collects data about many runtime subsystems and
mprof-report prints a summary of all the subsystems that are found in
the data file. It is possible to tell mprof-report to only show
information about some of them with the following option:
--reports=R1[,R2...]
where the report names R1, R2 etc. can be:
o header: information about program startup and profiler version
o jit: JIT compiler information
o sample: statistical sampling information
o gc: garbage collection information
o alloc: object allocation information
o call: method profiling information
o metadata: metadata events like image loads
o exception: exception throw and handling information
o monitor: lock contention information
o thread: thread information
o heapshot: live heap usage at heap shots
o counters: counters samples
It is possible to limit some of the data displayed to a timeframe of
the program execution with the option:
--time=FROM-TO
where FROM and TO are seconds since application startup (they can be
floating point numbers).
Another interesting option is to consider only events happening on a
particular thread with the following option:
--thread=THREADID
where THREADID is one of the numbers listed in the thread summary
report (or a thread name when present).
By default long lists of methods or other information like object
allocations are limited to the most important data. To increase the
amount of information printed you can use the option:
--verbose
Track individual objects
Instead of printing the usual reports from the profiler data, it is
possible to track some interesting information about some specific
object addresses. The objects are selected based on their address with
the --track option as follows:
--track=0xaddr1[,0xaddr2,...]
The reported info (if available in the data file), will be class name,
size, creation time, stack trace of creation (with the --traces
option), etc. If heapshot data is available it will be possible to
also track what other objects reference one of the listed addresses.
The object addresses can be gathered either from the profiler report in
some cases (like in the monitor lock report), from the live application
or they can be selected with the --find=FINDSPEC option. FINDSPEC can
be one of the following:
o S:SIZE: where the object is selected if it's size is at least SIZE
o T:NAME: where the object is selected if NAME partially matches its
class name
This option can be specified multiple times with one of the different
kinds of FINDSPEC. For example, the following:
--find=S:10000 --find=T:Byte[]
will find all the byte arrays that are at least 10000 bytes in size.
Note that with a moving garbage collector the object address can
change, so you may need to track the changed address manually. It can
also happen that multiple objects are allocated at the same address, so
the output from this option can become large.
Saving a profiler report
By default mprof-report will print the summary data to the console. To
print it to a file, instead, use the option:
--out=FILENAME
Dealing with profiler slowness
If the profiler needs to collect lots of data, the execution of the
program will slow down significantly, usually 10 to 20 times slower.
There are several ways to reduce the impact of the profiler on the
program execution.
Use the statistical sampling mode
Statistical sampling allows executing a program under the profiler with
minimal performance overhead (usually less than 10%). This mode allows
checking where the program is spending most of it's execution time
without significantly perturbing its behaviour.
Collect less data
Collecting method enter/leave events can be very expensive, especially
in programs that perform many millions of tiny calls. The profiler
option nocalls can be used to avoid collecting this data or it can be
limited to only a few call levels with the calldepth option.
Object allocation information is expensive as well, though much less
than method enter/leave events. If it's not needed, it can be skipped
with the noalloc profiler option. Note that when method enter/leave
events are discarded, by default stack traces are collected at each
allocation and this can be expensive as well. The impact of stack
trace information can be reduced by setting a low value with the
maxframes option or by eliminating them completely, by setting it to 0.
The other major source of data is the heapshot profiler option:
especially if the managed heap is big, since every object needs to be
inspected. The MODE parameter of the heapshot option can be used to
reduce the frequency of the heap shots.
Reduce the timestamp overhead
On many operating systems or architectures what actually slows down
profiling is the function provided by the system to get timestamp
information. The time=fast profiler option can be usually used to
speed up this operation, but, depending on the system, time accounting
may have some level of approximation (though statistically the data
should be still fairly valuable).
Dealing with the size of the data files
When collecting a lot of information about a profiled program, huge
data files can be generated. There are a few ways to minimize the
amount of data, for example by not collecting some of the more space-
consuming information or by compressing the information on the fly or
by just generating a summary report.
Reducing the amount of data
Method enter/leave events can be excluded completely with the nocalls
option or they can be limited to just a few levels of calls with the
calldepth option. For example, the option:
calldepth=10
will ignore the method events when there are more than 10 managed stack
frames. This is very useful for programs that have deep recursion or
for programs that perform many millions of tiny calls deep enough in
the call stack. The optimal number for the calldepth option depends on
the program and it needs to be balanced between providing enough
profiling information and allowing fast execution speed.
Note that by default, if method events are not recorded at all, the
profiler will collect stack trace information at events like
allocations. To avoid gathering this data, use the maxframes=0
profiler option.
Allocation events can be eliminated with the noalloc option.
Heap shot data can also be huge: by default it is collected at each
major collection. To reduce the frequency, you can specify a heapshot
mode: for example to collect every 5 collections (including major and
minor):
heapshot=5gc
or when at least 5 seconds passed since the last heap shot:
heapshot=5000ms
Compressing the data
To reduce the amout of disk space used by the data, the data can be
compressed either after it has been generated with the gzip command:
gzip -9 output.mlpd
or it can be compressed automatically by using the zip profiler option.
Note that in this case there could be a significant slowdown of the
profiled program.
The mprof-report program will tranparently deal with either compressed
or uncompressed data files.
Generating only a summary report
Often it's enough to look at the profiler summary report to diagnose an
issue and in this case it's possible to avoid saving the profiler data
file to disk. This can be accomplished with the report profiler
option, which will basically send the data to the mprof-report program
for display.
To have more control of what summary information is reported (or to use
a completely different program to decode the profiler data), the output
profiler option can be used, with | as the first character: the rest of
the output name will be executed as a program with the data fed in on
the standard input.
For example, to print only the Monitor summary with stack trace
information, you could use it like this:
output=|mprof-report --reports=monitor --traces -
WEB SITE
http://www.mono-project.com/Profiler
SEE ALSO
mono(1)
AUTHORS
Paolo Molaro.
mprof-report(1)