DragonFly BSD
DragonFly kernel List (threaded) for 2013-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

[GSOC] HAMMER2 compression feature week6 report


From: Daniel Flores <daniel5555@xxxxxxxxx>
Date: Sun, 28 Jul 2013 04:42:01 +0200

--089e013d11c601ddae04e289525c
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hello everyone,
here is my report for 6th week.

During this week a couple of significant changes happened.

First of them is that I had to return to using intermediary buffer in read
path because of certain tricky bug that corrupted the decompressed result.
So now it's not performing decompression directly from physical buffer to
logical, but, once again, first it decompresses the data into an
intermediary buffer from which it is copied then to logical buffer.

Second is that now I don't allocate those intermediary buffers right before
using them. Matthew Dillon suggested me to use a special data structure
called objcache which handles the allocations automatically and now I use
this structure to obtain the buffers when I need them without performing
the allocation every time when a block of data has to be compressed or
decompressed. The objcache itself is being created/destroyed when a HAMMER2
module is loaded/unloaded, so the overhead is extremely small, even though
the structure is being created all the time, even in case that user doesn't
use the compression feature.

The result is that even though I still use intermediary buffers the
performance improved a lot. Since I promised in my previous report that
I'll provide some numbers, here they are. I provide them even though there
is a room for improvement and I'll be working on optimizing the objcache
parameters as well as some other things. Still I think it's interesting to
take a quick look at them just to have an approximation of how it is right
now.

It should be noted, though, that those results don't measure the
reading/writing performance directly and, also, that they were obtained on
a virtual machine, which means that on real hardware the performance would
be better. Also note that the same disk is shared between a VM and a host
OS, so this also introduces uncertainty in those numbers.

Now, on the test description... I used 5 test cases for both write and read
path.

Case 1 is a small file that can't be compressed. It's a 2.2 MB JPEG image.
Case 2 is a big file that can't be compressed. It's a 77.2 MB video file.
Case 3 is a small file that compresses perfectly. It's a 3.5 MB TIFF image.
Case 4 is a big file that compresses perfectly. It's a 47.5MB log file.
Case 5 is, finally, a bunch of files some of which do not compress, some
compress partially, and some compress perfectly. There are 35 files that
have total size of 184.9 MB and they also contain the files used in
previous cases.

For each case I copied the file/files with cp command from HAMMER partition
to HAMMER2 partition in 2 different folders, one without compression and
other with LZ4 compression. I did this 10 times for each folder measuring
time with the =93time=94 utility (total elapsed time). Then I compared the
files from HAMMER2 partition with the originals on HAMMER partition using
the diff command, again 10 times and using =93time=94 utility to measure th=
e
time it takes. I also remounted the HAMMER2 partition between each diff to
insure that there wouldn't be any caching that would affect the results.

So, in case of cp we have, roughly, the time it takes to read from HAMMER
partition + the time it takes to write to HAMMER2 partition and in case of
diff we have time it takes to read from HAMMER partition + time it takes to
read from HAMMER2 partition + whatever time it takes to compare files.

The difference which is important for us is the difference between the time
spent on HAMMER2 part without compression and time spent on HAMMER2 part
with compression. Let's see the difference.

You can see the summarized results in this table [1].

In the case of files that can't be compressed, we can see that the write
time is slower when the compression is turned on. This happens because even
though the file can't be compressed, the file system tries to compress each
block, fails and thus wastes some time on that. In order to address this
issue, Matthew Dillon suggested to try to detect a file that can't be
compressed and not compress it (detection is done by counting the number of
contiguous blocks for which the compression failed, so the compression
wouldn't be tried again after a certain number is reached). With this
possible improvement, it's possible that the difference between time spent
on write paths with and without compression wouldn't be perceivable. I'll
try to implement this improvement later.

On the other hand, there is no significant difference when it comes to read
time. This happens, probably, because the read path with compression just
checks whether or not the specific block is compressed and only tries to
decompress it if it is compressed. So there shouldn't be much difference.

The case of files that can be perfectly compressed is interesting. When it
comes to write time it's not different from previous case, the only
difference is that it doesn't fail during compression now, but it's still
slower than write path without compression.

The read path with compression seems to be actually a bit faster than read
path without compression in this particular case. This happens, probably,
because the hard drive needs to read less data than in case without
compression. Since in our case the compression is successful only when a
block is compressed to 50% of its size or less, if all block are
compressed, the resulting size of a file is significantly smaller. Also,
the LZ4 decompression algorithm is so fast, that it doesn't affect much the
overall time.

Finally, the case of group of files seems to follow the trend. In real
world the performance will depend on the type of files.

In conclusion, for now we can be sure that regardless of whether files can
be compressed or not, the write time will be slower in case of write path
with compression than in case of write path without it. On the other hand,
the read path with compression will, probably, be actually a bit faster in
case that the files were compressed and will have the same speed as read
path without decompression if they weren't. It looks like it's possible to
optimize the write path with compression, so that it won't be significantly
slower than without it in case that the file can't be compressed.

The remaining part of the weekend and the next week I'll be working on
several things. Even though I said previously that I implemented
zero-checking, in fact it's not implemented correctly and I'll work on it
again. Also, the code overall needs to be cleaned up, because it's
extremely messy right now and there surely are many things to optimize, for
example, the parameters of objcache and the write path with compression. I
also need to continue with bug-hunting...

I'll appreciate any comments, suggestions and criticism. All the code is
available in my repository, branch =93hammer2_LZ4=94 [2].


Daniel

[1] http://leaf.dragonflybsd.org/~iostream/performance_table.html
[2] git://leaf.dragonflybsd.org/~iostream/dragonfly.git

--089e013d11c601ddae04e289525c
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello everyone,<br></div><div>here is my report for 6=
th week.</div><div><br></div><div>During this week a couple of significant =
changes happened.</div><div><br></div><div>First of them is that I had to r=
eturn to using intermediary buffer in read path because of certain tricky b=
ug that corrupted the decompressed result. So now it&#39;s not performing d=
ecompression directly from physical buffer to logical, but, once again, fir=
st it decompresses the data into an intermediary buffer from which it is co=
pied then to logical buffer.</div>
<div><br></div><div>Second is that now I don&#39;t allocate those intermedi=
ary buffers right before using them. Matthew Dillon suggested me to use a s=
pecial data structure called objcache which handles the allocations automat=
ically and now I use this structure to obtain the buffers when I need them =
without performing the allocation every time when a block of data has to be=
 compressed or decompressed. The objcache itself is being created/destroyed=
 when a HAMMER2 module is loaded/unloaded, so the overhead is extremely sma=
ll, even though the structure is being created all the time, even in case t=
hat user doesn&#39;t use the compression feature.</div>
<div><br></div><div>The result is that even though I still use intermediary=
 buffers the performance improved a lot. Since I promised in my previous re=
port that I&#39;ll provide some numbers, here they are. I provide them even=
 though there is a room for improvement and I&#39;ll be working on optimizi=
ng the objcache parameters as well as some other things. Still I think it&#=
39;s interesting to take a quick look at them just to have an approximation=
 of how it is right now.</div>
<div><br></div><div>It should be noted, though, that those results don&#39;=
t measure the reading/writing performance directly and, also, that they wer=
e obtained on a virtual machine, which means that on real hardware the perf=
ormance would be better. Also note that the same disk is shared between a V=
M and a host OS, so this also introduces uncertainty in those numbers.</div=
>
<div><br></div><div>Now, on the test description... I used 5 test cases for=
 both write and read path.</div><div><br></div><div>Case 1 is a small file =
that can&#39;t be compressed. It&#39;s a 2.2 MB JPEG image.=A0</div><div>
Case 2 is a big file that can&#39;t be compressed. It&#39;s a 77.2 MB video=
 file.</div><div>Case 3 is a small file that compresses perfectly. It&#39;s=
 a 3.5 MB TIFF image.</div><div>Case 4 is a big file that compresses perfec=
tly. It&#39;s a 47.5MB log file.</div>
<div>Case 5 is, finally, a bunch of files some of which do not compress, so=
me compress partially, and some compress perfectly. There are 35 files that=
 have total size of 184.9 MB and they also contain the files used in previo=
us cases.</div>
<div><br></div><div>For each case I copied the file/files with cp command f=
rom HAMMER partition to HAMMER2 partition in 2 different folders, one witho=
ut compression and other with LZ4 compression. I did this 10 times for each=
 folder measuring time with the =93time=94 utility (total elapsed time). Th=
en I compared the files from HAMMER2 partition with the originals on HAMMER=
 partition using the diff command, again 10 times and using =93time=94 util=
ity to measure the time it takes. I also remounted the HAMMER2 partition be=
tween each diff to insure that there wouldn&#39;t be any caching that would=
 affect the results.</div>
<div><br></div><div>So, in case of cp we have, roughly, the time it takes t=
o read from HAMMER partition + the time it takes to write to HAMMER2 partit=
ion and in case of diff we have time it takes to read from HAMMER partition=
 + time it takes to read from HAMMER2 partition + whatever time it takes to=
 compare files.</div>
<div><br></div><div>The difference which is important for us is the differe=
nce between the time spent on HAMMER2 part without compression and time spe=
nt on HAMMER2 part with compression. Let&#39;s see the difference.</div>
<div><br></div><div>You can see the summarized results in this table [1].</=
div><div><br></div><div>In the case of files that can&#39;t be compressed, =
we can see that the write time is slower when the compression is turned on.=
 This happens because even though the file can&#39;t be compressed, the fil=
e system tries to compress each block, fails and thus wastes some time on t=
hat. In order to address this issue, Matthew Dillon suggested to try to det=
ect a file that can&#39;t be compressed and not compress it (detection is d=
one by counting the number of contiguous blocks for which the compression f=
ailed, so the compression wouldn&#39;t be tried again after a certain numbe=
r is reached). With this possible improvement, it&#39;s possible that the d=
ifference between time spent on write paths with and without compression wo=
uldn&#39;t be perceivable. I&#39;ll try to implement this improvement later=
.</div>
<div><br></div><div>On the other hand, there is no significant difference w=
hen it comes to read time. This happens, probably, because the read path wi=
th compression just checks whether or not the specific block is compressed =
and only tries to decompress it if it is compressed. So there shouldn&#39;t=
 be much difference.</div>
<div><br></div><div>The case of files that can be perfectly compressed is i=
nteresting. When it comes to write time it&#39;s not different from previou=
s case, the only difference is that it doesn&#39;t fail during compression =
now, but it&#39;s still slower than write path without compression.</div>
<div><br></div><div>The read path with compression seems to be actually a b=
it faster than read path without compression in this particular case. This =
happens, probably, because the hard drive needs to read less data than in c=
ase without compression. Since in our case the compression is successful on=
ly when a block is compressed to 50% of its size or less, if all block are =
compressed, the resulting size of a file is significantly smaller. Also, th=
e LZ4 decompression algorithm is so fast, that it doesn&#39;t affect much t=
he overall time.</div>
<div><br></div><div>Finally, the case of group of files seems to follow the=
 trend. In real world the performance will depend on the type of files.=A0<=
/div><div><br></div><div>In conclusion, for now we can be sure that regardl=
ess of whether files can be compressed or not, the write time will be slowe=
r in case of write path with compression than in case of write path without=
 it. On the other hand, the read path with compression will, probably, be a=
ctually a bit faster in case that the files were compressed and will have t=
he same speed as read path without decompression if they weren&#39;t. It lo=
oks like it&#39;s possible to optimize the write path with compression, so =
that it won&#39;t be significantly slower than without it in case that the =
file can&#39;t be compressed.</div>
<div><br></div><div>The remaining part of the weekend and the next week I&#=
39;ll be working on several things. Even though I said previously that I im=
plemented zero-checking, in fact it&#39;s not implemented correctly and I&#=
39;ll work on it again. Also, the code overall needs to be cleaned up, beca=
use it&#39;s extremely messy right now and there surely are many things to =
optimize, for example, the parameters of objcache and the write path with c=
ompression. I also need to continue with bug-hunting...</div>
<div><br></div><div>I&#39;ll appreciate any comments, suggestions and criti=
cism. All the code is available in my repository, branch =93hammer2_LZ4=94 =
[2].</div><div><br></div><div><br></div><div>Daniel</div><div><br></div><di=
v>
[1] <a href=3D"http://leaf.dragonflybsd.org/~iostream/performance_table.htm=
l">http://leaf.dragonflybsd.org/~iostream/performance_table.html</a></div><=
div>[2] git://<a href=3D"http://leaf.dragonflybsd.org/~iostream/dragonfly.g=
it">leaf.dragonflybsd.org/~iostream/dragonfly.git</a></div>
</div>

--089e013d11c601ddae04e289525c--



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]