DragonFly BSD
DragonFly users List (threaded) for 2005-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: DP performance

From: Danial Thom <danial_thom@xxxxxxxxx>
Date: Wed, 30 Nov 2005 07:08:47 -0800 (PST)

--- Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>

> :Should we be really that pessimistic about
> potential MP performance, 
> :even with two NICs only?  Typically packet
> flows are bi-directional, 
> :and if we could have one CPU/core taking care
> of one direction, then 
> :there should be at least some room for
> parallelism, especially once the 
> :parallelized routing tables see the light.  Of
> course provided that 
> :each NIC is handled by a separate core, and
> that IPC doesn't become the 
> :actual bottleneck.
>     The problem is that if you only have two
> interfaces, every incoming
>     packet being routed has to go through both
> interfaces, which means
>     that there will be significant memory
> contention between the two cpus
>     no matter what you do.  This won't degrade
> the 2xCPUs by 50%... it's
>     probably more like 20%, but if you only
> have two ethernet interfaces 
>     and the purpose of the box is to route
> packets, there isn't much of a
>     reason to make it an SMP box.  cpu's these
> days are far, far faster then
>     two measily GigE ethernet interface that
> can only do 200 MBytes/sec each.
>     Even more to the point, if you have two
> interfaces you still only have
>     200 MBytes/sec worth of packets to contend
> with, even though each incoming
>     packet is being shoved out the other
> interface (for 400 MBytes/sec of
>     total network traffic).  It is still only
> *one* packet that the cpu is
>     routing.  Even cheap modern cpus can shove
> around several GBytes/sec 
>     without DMA so 200 MBytes/sec is really
> nothing to them.

MBs/sec is not the relevant measure; its pps. Its
the iterations that are the limiting factor,
particularly if you are acting on the packet. The
simplistic analysis of packet in / packet out is
one thing; but the expectation is that *some*
operation is being carried out for each packet,
whether its a firewall check or something even
more intensive. Its pretty rare these days to
have a box that just moves bytes from one
interface to another without some value-added
task. Otherwise you just get a switch and you're
not using a unix-like box.

> :>     Main memory bandwidth used to be an
> issue but isn't so much any
> :> more.
> :
> :The memory bandwidth isn't but latency _is_
> now the major performance 
> :bottleneck, IMO.  DRAM access latencies are
> now in 50 ns range and will 
> :not noticeably decrease in the forseeable
> future.  Consider the amount 
> :of independent memory accesses that need to be
> performed on per-packet 
> :...
> :Cheers
> :
> :Marko
>     No, this is irrelevant.  All modern
> ethernet devices (for the last decade
>     or more) have DMA engines and fairly
> significant FIFOs, which means that
>     nearly all memory accesses are going to be
> burst accesses capable of
>     getting fairly close to the maximum burst
> bandwidth of the memory.  I
>     can't say for sure that this is actually
> happening without a putting
>     a logic analyzer on the memory bus, but I'm
> fairly sure it is.  I seem
>     to recall that the PCI (PCIx, PCIe, etc)
> bus DMA protocols are all burst
>     capable protocols.

Typically only 64 bytes are "burst" at a time max
(if there are no other requests), so you're not
always bursting the entire frame. As the bus
becomes more saturated, you have shorter and
shorter bursts. With 2 devices you're
"realizable" bus bandwidth is about 2/3 for
monodirectional traffic and 1/2 for bi
directional. This puts PCI-X (~8Mb/s) just on the
edge of being fully capable of full-duplex gigE. 


Yahoo! Mail - PC Magazine Editors' Choice 2005 

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]