DragonFly kernel List (threaded) for 2006-06
Re: warning about 'large-function-growth limit reached'
:Dmitri Nikulin wrote:
:> Inlining is still not as dangerous as loop unrolling. Imagine
:> unrolling a three-page loop with a thousand iterations. That's three
:> thousand pages of separate instructions, for what? I hope gcc handles
:Loop unrolling doesn't work this way. It unrolls a loop a small number of
:times, likes 4 times if possible. The Intel compiler does that very
:systematically, and with very good results.
:> *that* correctly and notices the loop unroll is completely worthless,
:> in fact having to load that much more code into cache probably means
:> it's a pessimisation. Or compromises and encapsulates the body...
:People doing computations try different optimization flags, different
:compilers and choose the best for *their* computations. Differences can be
:enormous, like twice faster. I have done that myself, a lot.
Generally speaking you don't want to unroll a loop on a modern cpu
because the branch prediction cache makes the 'looping' operation
essentially free. A very tiny loop, one that only ever iterates a
few times (like 4 or 5) might benefit (because the branch prediction
cache will miss at least once in the loop), but anything larger then
that wouldn't. Also, any loop unrolling will add additional pollution
to the L1 code cache and that is a much bigger deal.
If you are interested in looking into actual numbers, I suggest taking
one of the timing tests in /usr/src/test/sysperf (like loop*.c) and
modifying it to time a loop with and without unrolling.