On Wed, Jul 28, 2010 at 11:39 AM, Alex Hornung
<ahornung@gmail.com> wrote:
I absolutely don't see how forcing the I/O from N different threads
onto 2 (events are not I/O effectively) is better than having each I/O
maintain (mostly) it's own context. Your particular case may not
suffer from any performance impact, but I was mostly talking about a
future-proof solution.
Here is some detailing of how a geom class can improve performance on multithreaded IO.
http://lists.freebsd.org/pipermail/freebsd-geom/2006-June/001290.html
http://retis.sssup.it/~fabio/soc09/downloads/D3.pdfAlso you look at FreeBSD's sys/geom/eli/g_eli.c for an example implementation for a dedicated thread so as not to pollute the geom up/down threads. Disks can only read/write one thing at a time anyways, and if you have a device with multiple providers, requests can be optimized and split across them.
I don't think it's been shown that GEOM is or isn't "future-proof". Same with DM. If problems arise in either implementation, I'm sure they would be worked around.
I mean every module that needs metadata. From the top of my head I
could mention the lvm parser and geli, but I'd suspect most others,
too. So GEOM has a lvm parser that is utterly incomplete and obviously
offers no management whatsoever. How is that superior to having most
of the lvm functionality (in userland), easy to keep up to date and
offering the same tools as on Linux?
You don't address the GPL part, which is a problem for me and maybe some others. Other BSD's have struggled for a long time removing GPL tools. I also don't understand why having the same tools a Linux is desirable. If that's the goal(eg to attract Linux users) IMO it's misguided.
Now let's see... you write an I/O scheduler on DragonFly... you simply
use the dsched framework which fits nicely on top of the disk
subsystem. As a matter of fact I could even change dm slightly to use
the disk subsystem, too, and hence allow I/O schedulers, mbr, gpt and
disklabels on top of dm devices, but I don't think there's much point
to it at this time.
I wasn't aware of dsched, pretty cool. The point remains though as I could list a lot of different modules you couldn't match easily without GEOM. Also not all Linux block stuff is done in userland, eg DRBD. As you point out later this is somewhat subjective, and could easily turn into an pissing match which is not what I want. What I'm mainly looking for is input on importing GEOM, and given your experience any insight would be helpful there.
My point is that there's no need to learn anything new. Also, this is
completely subjetive. Maybe you prefer GEOM; I'd argue some of the
Linux counterparts are way more intuitive.
LVM is only one consumer of device mapper as I said before, so there's
really no point in doing this comparison. LVM was imported strictly
because of the compatibility with Linux. cryptsetup is another
consumer of device mapper, which offers a different interface. My
point here is that it's extremely simple to write userland tools to
fit anyone's needs. I'm currently working on a mirror target for the
device mapper; it'll also have its own userland tool, and not
dependent on LVM which you seem to find cumbersome.
GEOM modules can certianly working in userland, eg ggate, plus it's easily stackable nature and BSD license are other features are what I find attractive. Obviously you and I disagree on the merits of DM/LVM stuff, but if GEOM had been implemented you wouldn't have to write the mirror code. You'd already have well tested, performant code.
I still don't get your point. GPT support in the loader is not
assisted in any way by geom or any other similar mess.
The gpart class is a helper of the loader. It creates the normal gpt boot partition unlike the hack that exists in gpt(8)
Also I don't see any advantage to any softraid implementation which
requires a full disk sync after a minor glitch, such as someone
pulling a plug temporarily or a crash/reboot or a misprobe.
The gmirror sync after an unclean disconnect is greatly reduced on gjournal volumes, I haven't timed it lately but it's something like 1 to 2 minutes for TB+ sized volumes.