DragonFly BSD
DragonFly submit List (threaded) for 2004-12
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: atomic 64 bit add for pentium+

From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 8 Dec 2004 19:18:28 -0800 (PST)

:Hi all,
:as promised on commits@, here is a generic 64 bit add operator for
:Pentium+ and the necessary change for gencount_inc. Also attached
:a small hack for cpuperf used for the numbers below.
:The good message is that gencount_inc can be made critical section free,
:the bad is the performance of cmpxchg8b on p4. Like so many other ops,
:it totally sucks.
:My P4 notebook: 115.857nS/loop for cmpxchg8b, compared to 7.517nS/loop
:for cmpxchg.
:Leaf (AMD64): 6.788nS/loop for cmpxchg8b, compared to 1.293nS/loop for
:Intel sucks.
:Conclusion: The overhead on AMD64 is much less and seems completely
:acceptable, for P4 it depends. Matt, what's the speed of critical
:sections on P4?
:I'd like to get some numbers for other processors as well.

    Joerg, I really dislike that whole 64 bit generation number header file
    concept.  It doesn't fit with the DragonFly cpu localization model.  It's
    more like FreeBSD's heavy weight model rather then our model.   I'd really
    rather that header file be removed and no assembly be implemented.

    FreeBSD has tripped over itself on multiple occassions trying to create
    generic operations that work across all cpus and procesors and the only
    thing that has come out of it has been a huge mess.

    If we implement the algorithms correctly, no critical section or atomic
    operations are required at all.   The operations just become normal 64
    bit arithmatic.

    I'll give you an example... lets say you wanted a unique identifer in a
    SMP system.  Say you have 4 cpus.  If the identifer only needs to be
    unique and does not have to monotonically increasing then all you have
    to do is give each cpu its own unique numerical space, e.g. by 
    initializing a per-cpu variable to cpuid and incrementing it by NUMCPUS
    (on a per cpu basis) whenever you need a new id.

    The vast majority of uses of this sort of feature will be in situations
    where only non-interrupt code will manipulate a generation id, in which
    case you don't even need a critical section.  You just use standard C
    arithmatic... so this case devolves down into:

	newid = mycpu->gd_non_interrupt_gen;
	mycpu->gd_non_interrupt_gen += NUMCPUS;


[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]