DragonFly kernel List (threaded) for 2007-06
DragonFly BSD
DragonFly kernel List (threaded) for 2007-06
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Interrupt load with niced processes

From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Jun 2007 20:55:15 -0700 (PDT)

::~:# while true; do /bin/echo "foo" > /dev/null; done
::27.5%Sys  51.3%Intr 21.3%User  0.0%Nice  0.0%Idl
::But it doesn't explain why is interrupt load so high. At least for me ...
::yet ... ;).
:    I get 18% ish.  Ho!  I'll track it down.

    Ok, I think the issue is related to the clock interrupt getting
    delayed by a critical section and ending up running from splz().
    I still have to track down the exact cause but it's just a little
    statistics snafu and has nothing to do with actual performance.

    On the bright side, while investigating the issue I noticed an
    inordinately high number of IPI messages flying between cpus while
    running the /bin/sh do/nice/echo/while test that was posted.  I tracked
    them down to the exec and pmap code.  The exec support and elf loader
    were doing temporary KVM mappings to cache bits of data during the
    exec, and the pmap code was explicitly destroying the KVM mapping for
    the page directory page for the mmu on process exit.

    Temporary KVM mappings incur SMP invalidation overhead.  Basically
    an IPI has to be sent to the other cpus to tell them to invalidate
    their TLB.  All three performance issues have been fixed.  Exec
    now uses the SF_BUF facility for temporary mappings, and the OBJCACHE
    facility for temporary execl argument space.  The pmap code already
    uses the OBJCACHE facility so the unmapping of the page directory
    was simply moved to the OBJCAHE destructor function.  The use of
    OBJCACHE also automatically maintains very good locality of 
    reference for allocations.  The result is that no SMP invalidations
    occur any more for that critical path (or for just about anything
    userland does any more)... very very nice.

    I also noticed that VM objects have terrible locality of reference,
    because they still use zalloc() (which is aweful), but VM objects
    are a bit more complicated due to low level bootstrapping issues so
    I haven't changed them over yet.

    We still have a big hangup with the Big Giant Lock, of course, but
    the cpus are considerably better separated from each other now then
    they were before.

					Matthew Dillon 

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]