DragonFly BSD
DragonFly kernel List (threaded) for 2010-02
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: kernel work week of 3-Feb-2010 HEADS UP

From: Freddie Cash <fjwcash@xxxxxxxxx>
Date: Thu, 4 Feb 2010 21:23:10 -0800

On Thu, Feb 4, 2010 at 7:18 PM, Matthew Dillon <dillon@apollo.backplane.com> wrote:

:Is the concern that people would be more inclined to remove an SSD than a
:regular drive by mistake, or that splitting off the log could lead to an
:"oops, I forgot that the log was separate" situation when changing out
:drives?  Or something else?
:It seems like an odd thing to worry about, to be honest.  If you can't
:trust users not to start removing important components from their

   Well, true enough.  I guess the real issue I have is that one
   is dedicated a piece of equipment to a really tiny piece of the
   filesystem.  Though I can't deny the utility of having a fast fsync().
   If the storage system is big enough then, sure.  If you're talking
   about going from one physical drive to two though it probably isn't
   worth the added complexity it just to get a fast fsync().

 This would be a setup similar to the ZFS L2ARC (cache) and SLOG (separate log device).

The cache device is one or more read-optimised (ie MLC) SSDs.  Any data that would be ejected from the in-memory ARC is then written to the cache device.  Any future reads of that data are pulled from the cache device instead of from disk.  These should be as big and as fast (for reads) as possible.  It's basically treated as extra "RAM".

The separate log device is a mirrored pair (redundancy is critical for this part) of write-optimised (ie SLC) SSDs.  Any block writes smaller than 64K go directly into the ZIL and marked as "written to disk" while also being queued for writing to the pool.  If the server crashes, the ZIL is read and any transaction groups that are missing from the pool are copied over from the ZIL.  If the server never crashes, the data in the ZIL is never actually used.  In most cases, the ZIL only needs to be a few GB in size.

Until very, very recent versions of ZFS, removing log devices from a pool was impossible, so if it died, the pool was unusable and all data lost, which is why using mirrored sets was important.  One can now remove log devices, which moves the ZIL back into the pool.

This would be similar to the swap cache on MLC SSD, and the UNDO log/FIFO on SLC SSD.
Freddie Cash

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]