DragonFly kernel List (threaded) for 2004-02
Re: lkwt in DragonFly
: Anyway, so for the UMA _keg_ extensions, since there's no
: interlocking, the replacement was next to trivial and the amount of
: code within the critical section was minimal (all you do is check
: the keg and if non-empty, allocate from it or exit the
: critical section and do something else). And it is precisely with
: this change that I noticed a slight pessimization. So either I
: grossly underestimated the number of interrupts that occur on
: average while in that critical section or the cost of
: entering/exiting the critical section is at least as high as that of
: grabbing/dropping a mutex. Again, this may not be the case for
: DragonFly. [Ed.: And now that I've read what follows, I predict it
: likely isn't]
Ok, I understand what you have done. I've looked at the FreeBSD-5
code and from what I can tell my optimized critical section code
was either never committed or it was backed out. FreeBSD-5 seems
to be doing a lot of sti/cli junk and that is going to be far worse
then obtaining or releasing a mutex. DFly does not have that issue.
What you are doing in UMA we tend to do is discrete APIs which
operate on the globaldata structure. e.g. I cache struct thread's
within a discrete API, for example, and do not use a generalized
(UMA-like) infrastructure that could cover multiple data types.
I have found that a discrete implementation for simple typed data
structure caches like these are far more understandable and far easier
to code and maintain and we will probably do something similar for
the mbuf and mbuf+cluster allocations in DFly. Our slab allocator
is also per-cpu, but I have not attempted to (nor do I believe it is
a good idea to) try to integrate type-stable caching within the
slab allocator's API abstraction. The reason is simply that differently
typed structures have different requirements that cannot be optimally
met with a single, general, CTOR/DTOR style API.
:> The DragonFly slab allocator does not need to use a mutex or token
:> at all and only uses a critical section for very, very short periods
:> of time within the code. I have suggested that Jeff recode UMA to
:> remove the UMA per-cpu mutex on several occassions.
: I have been in touch with Jeff regarding UMA issues for a long while
: and he has mentionned that he did exactly that, several months ago.
: However, I'm not sure exactly what preempted that work from going in.
: It's very possible that the interlocking issues involving being in
: the critical section and having to grab the zone lock in the case
: where the pcpu cache was empty remained unresolved. Also, since I
: did something similar (and simpler) and noticed a pessimisation,
: actual performance would have to be evaluated prior to making a
: change like that - and perhaps it was only to find that performance
: was worse.
Dealing with cache exhaustion is definitely an issue but I do not see
the issue as being much different from how the mbuf cluster code worked
in the first place.
: If you are already in a critical section, the cost is negligeable.
: If you are not, which is ALWAYS when it comes to the UMA keg code,
: then you always disable interrupts. I remember you a while back
: committing changes that made the general critical section enter and
: exit faster in the common case, deferring the cli to the scenario
: where an interrupt actually occurs. I don't remember the details
: behind the backout. I guess I'd have to dig up the archives.
John seems to want to insist on a complex, machine-specific critical
section API. IMHO It's a mistake. By not guarenteeing fast critical
section code people coding for FreeBSD-5 are basically left with only
one 'optimized' locking interface... that being the mutex interface,
and wind up going through massive convolutions to use mutexes for things
that mutexes should not really be necessary for.
: Perhaps CPU migration should not be permitted as a side-effect of
: being pre-empted within the kernel, then we can consider similar
: optimisations in FreeBSD 5.x. Prior to that, however, I wonder what
: measurable gains there are from allowing full-blown pre-emption with
: CPU migration within the kernel, if any. I'll assume for the moment
: that there is a reasonable rationale behind that design decision.
:Bosko Milekic * bmilekic@xxxxxxxxxxxxxxxx * bmilekic@xxxxxxxxxxx
:TECHNOkRATIS Consulting Services * http://www.technokratis.com/
You will never see any statistical benchmark gains from allowing
non-interrupt preemption. Never, ever. Preemption is, by definition,
interrupting work that the cpu must do anyway with more work that the
cpu must do anyway. Without preemption the mainline thread work winds
up being serialized which will actually be MORE optimal then allowing
the preemption since there is less L1 cache pollution and the thread
work in question is usually short-lived anyhow. Any embedded systems
programmer will tell you the same thing and the concepts apply to
both DFly and FreeBSD big-time.
The only thing you get from kernel preemption is potentially better
responsiveness to interrupts when they block on something. That's it,
nothing else.... and, frankly, FreeBSD-5's priority borrowing is nothing
more then one huge hack which tries to work around the truely horrible
coding practices being used with mutexes from interrupts. This is one
reason why I prefer discrete APIs for specialized operations such as
mbuf allocation which might have to be done from an interrupt. Trying
to squeeze everything into one generalized API makes it impossible to
taylor a general API for interrupt or non-interrupt use without adding
some severe hacks (like priority borrowing).
Keep in mind that there are two kinds of preemption that we are talking
about here. DragonFly allows interrupt threads to preempt mainline
kernel threads, just like FreeBSD. What DragonFly does not allow
(and FreeBSD does allow) is for an interrupt thread to switch to
a non-interrupt thread before returning to the originally interrupted
non-interrupt thread. So in DFly preemption is a well understand
interface that allows us to make certain assumptions about what kinds
of things can preempt mainline kernel code, while in FreeBSD preemption
is 'wild'... anything can happen at any time, making it impossible to
have assumptions that could otherwise be used to optimize performance.