DragonFly users List (threaded) for 2008-01
Re: rsync vs. cvsup benchmarks
At 6:38 AM +0000 1/30/08, Vincent Stemen wrote:
The results are dramatic, with rsync performing hundreds of percent faster on
average while only loading the processor on the client side a little over
a third as much as cvsup. Either the performance claims about cvsup being
faster than rsync are based on theory without real world testing or cvsup has
gotten a lot slower or rsync has gotten a lot faster than in the past.
For those who are concerned about the validity of these results without
including server side load tests and tests under bandwidth congestion
conditions, here are my thoughts on the matter.
No matter where a bottleneck exists in the transfer, whether it
is server side load, client side load, or bandwidth limits, you
are going to experience similar loss of throughput.
The additional testing is nice to see, but you're not thinking the
issues through far enough when it comes to scaling up a service like
this. It's good to benchmark something which hasn't been tested in
some time, but you have to do pretty extensive benchmarks if you're
going to come to any sweeping conclusions for *all* uses of a program.
Let's say rsync takes 10% of the cpu on the client, and 10% of the
cpu on a server. Let's say cvsup for the same update takes 15% CPU
on the client, and 7% on the server. If your benchmarks ignore the
load on the server, then they can not possibly see problems which
could occur when scaling up to more clients.
With a single client connecting to the server, *neither* side is the
bottleneck. It might be the disk-speed is the main bottleneck at that
point. The update might take longer with cvsup due to the 15% CPU on
the client, but the CPU isn't much of a *bottleneck* at that point.
But with 10 connections in my fake scenario, rsync could be using 100%
of the CPU on the server. It's at this point that rsync will see some
bottleneck, while cvsup would only be using 70% of a CPU. Yes, cvsup
will be using much more on each client, but then each client shows up
with it's own CPU(s) to take up whatever load is thrown at that client.
The server does not receive additional CPU's or network-cards for
each connection that it accepts.
Again, my feeling is that rsync is almost certainly fine for using
with dragonfly's repository, given how much faster machines and
networks have gotten, and how many simultaneous connections are seen
for dragonfly repo-servers.
It makes plenty of sense to stick with rsync if your servers are not
overloaded. But if you want to prove rsync is better than cvsup for
what the loads that cvsup was *MEANT* to solve, then your tests are
not extensive enough. Benchmarking a client/server setup like cvsup
is a lot of work to get a complete picture.
Also note that you don't need to "prove" that rsync is "better". If it
is good-enough for what Dragonfly needs at this point, then use it. It
would be the people running the servers who will care about the load
there, and if you have enough servers then dragonfly may never notice
any problems from using rsync. Maybe a dragonfly repo-server will
never see more than 10 simultaneous connections, so you'll never even
hit the situations that freebsd had to deal with when cvsup was written.
(I suspect there's a lot fewer people on dial-up connections now, for
instance, so each individual connection will take a lot less wall-clock
time than it used to, so you're much less likely to see 100 simultaneous
Garance Alistair Drosehn = firstname.lastname@example.org
Senior Systems Programmer or email@example.com
Rensselaer Polytechnic Institute or firstname.lastname@example.org