DragonFly On-Line Manual Pages

LAT_CTX(8)                          LMBENCH                         LAT_CTX(8)

NAME
       lat_ctx - context switching benchmark

SYNOPSIS
       lat_ctx [ -P <parallelism> ] [ -W <warmups> ] [ -N <repetitions> ] [ -s
       <size_in_kbytes> ] #procs [ #procs ...  ]

DESCRIPTION
       lat_ctx measures context switching time for any reasonable number of
       processes of any reasonable size.  The processes are connected in a
       ring of Unix pipes.  Each process reads a token from its pipe, possibly
       does some work, and then writes the token to the next process.

       Processes may vary in number.  Smaller numbers of processes result in
       faster context switches.  More than 20 processes is not supported.

       Processes may vary in size.  A size of zero is the baseline process
       that does nothing except pass the token on to the next process.  A
       process size of greater than zero means that the process does some work
       before passing on the token.  The work is simulated as the summing up
       of an array of the specified size.  The summing is an unrolled loop of
       about a 2.7 thousand instructions.

       The effect is that both the data and the instruction cache get polluted
       by some amount before the token is passed on.  The data cache gets
       polluted by approximately the process ``size''.  The instruction cache
       gets polluted by a constant amount, approximately 2.7 thousand
       instructions.

       The pollution of the caches results in larger context switching times
       for the larger processes.  This may be confusing because the benchmark
       takes pains to measure only the context switch time, not including the
       overhead of doing the work.  The subtle point is that the overhead is
       measured using hot caches.  As the number and size of the processes
       increases, the caches are more and more polluted until the set of
       processes do not fit.  The context switch times go up because a context
       switch is defined as the switch time plus the time it takes to restore
       all of the process state, including cache state.  This means that the
       switch includes the time for the cache misses on larger processes.

OUTPUT
       Output format is intended as input to xgraph or some similar program.
       The format is multi line, the first line is a title that specifies the
       size and non-context switching overhead of the test.  Each subsequent
       line is a pair of numbers that indicates the number of processes and
       the cost of a context switch.  The overhead and the context switch
       times are in micro second units.  The numbers below are for a
       SPARCstation 2.

       "size=0 ovr=179
       2 71
       4 104
       8 134
       16 333
       20 438

BUGS
       The numbers produced by this benchmark are somewhat inaccurate; they
       vary by about 10 to 15% from run to run.  A series of runs may be done
       and the lowest numbers reported.  The lower the number the more
       accurate the results.

       The reasons for the inaccuracies are possibly interaction between the
       VM system and the processor caches.  It is possible that sometimes the
       benchmark processes are laid out in memory such that there are fewer
       TLB/cache conflicts than other times.  This is pure speculation on our
       part.

ACKNOWLEDGEMENT
       Funding for the development of this tool was provided by Sun
       Microsystems Computer Corporation.

SEE ALSO
       lmbench(8).

AUTHOR
       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1994-2000 Carl Staelin and Larry McVoy
                                    $Date$                          LAT_CTX(8)