DragonFly BSD
DragonFly kernel List (threaded) for 2011-01
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: git: HAMMER - Add live dedup sysctl and support


From: Thomas Nikolajsen <thomas.nikolajsen@xxxxxxx>
Date: Tue, 4 Jan 2011 10:00:18 +0100

Ilya Dryomov wrote at Tue Jan 4 03:07:02 2011 +0200:
>    HAMMER - Add live dedup sysctl and support
>    
>    * Adds *experimental* live dedup (aka efficient cp) support
>    
>    * sysctl vfs.hammer.live_dedup
>        0 - disabled (default)
>        1 - enabled, populate dedup cache on reads
>        2 - enabled, populate dedup cache on reads and writes

Thank you!
This looks really interesting, I will have to play with it right away.

The commit message is a bit short IMHO, but code has extensive comments;
I was enlightened quite a bit by the one below.

How is the relation between online (aka live) and offline HAMMER dedup?
Is typical use to run both if online is used?

Is offline HAMMER dedup always more efficient (disk space wise)?
Will online HAMMER dedup dedup data between PFSs (offline is per PFS, right)?

What are your plans/thoughts to further enhance HAMMER dedup?

 -thomas
-
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/507df98a152612f739140d9f1ac5b30cd022eea2
. .
+/************************************************************************
+ *                                            LIVE DEDUP                                                             *
+ ************************************************************************
+ *
+ * HAMMER Live Dedup (aka as efficient cp(1) implementation)
+ *
+ * The dedup cache is operated in a LRU fashion and destroyed on
+ * unmount, so essentially this is a live dedup on a cached dataset and
+ * not a full-fledged fs-wide one - we have a batched dedup for that.
+ * We allow duplicate entries in the buffer cache, data blocks are
+ * deduplicated only on their way to media. By default the cache is
+ * populated on reads only, but it can be populated on writes too.
+ *
+ * The main implementation gotcha is on-media requirement - in order for
+ * a data block to be added to a dedup cache it has to be present on
+ * disk. This simplifies cache logic a lot - once data is laid out on
+ * media it remains valid on media all the way up to the point where the
+ * related big block the data was stored in is freed - so there is only
+ * one place we need to bother with invalidation code.
+ */




[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]