DragonFly submit List (threaded) for 2004-12
Re: atomic 64 bit add for pentium+
:as promised on commits@, here is a generic 64 bit add operator for
:Pentium+ and the necessary change for gencount_inc. Also attached
:a small hack for cpuperf used for the numbers below.
:The good message is that gencount_inc can be made critical section free,
:the bad is the performance of cmpxchg8b on p4. Like so many other ops,
:it totally sucks.
:My P4 notebook: 115.857nS/loop for cmpxchg8b, compared to 7.517nS/loop
:Leaf (AMD64): 6.788nS/loop for cmpxchg8b, compared to 1.293nS/loop for
:Conclusion: The overhead on AMD64 is much less and seems completely
:acceptable, for P4 it depends. Matt, what's the speed of critical
:sections on P4?
:I'd like to get some numbers for other processors as well.
Joerg, I really dislike that whole 64 bit generation number header file
concept. It doesn't fit with the DragonFly cpu localization model. It's
more like FreeBSD's heavy weight model rather then our model. I'd really
rather that header file be removed and no assembly be implemented.
FreeBSD has tripped over itself on multiple occassions trying to create
generic operations that work across all cpus and procesors and the only
thing that has come out of it has been a huge mess.
If we implement the algorithms correctly, no critical section or atomic
operations are required at all. The operations just become normal 64
I'll give you an example... lets say you wanted a unique identifer in a
SMP system. Say you have 4 cpus. If the identifer only needs to be
unique and does not have to monotonically increasing then all you have
to do is give each cpu its own unique numerical space, e.g. by
initializing a per-cpu variable to cpuid and incrementing it by NUMCPUS
(on a per cpu basis) whenever you need a new id.
The vast majority of uses of this sort of feature will be in situations
where only non-interrupt code will manipulate a generation id, in which
case you don't even need a critical section. You just use standard C
arithmatic... so this case devolves down into:
newid = mycpu->gd_non_interrupt_gen;
mycpu->gd_non_interrupt_gen += NUMCPUS;