DragonFly bugs List (threaded) for 2008-07
DragonFly BSD
DragonFly bugs List (threaded) for 2008-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: hammer_alloc_data panic


From: Aggelos Economopoulos <aoiko@xxxxxxxxxxxxxx>
Date: Wed, 16 Jul 2008 03:29:18 +0300

On Wednesday 16 July 2008, Matthew Dillon wrote:
>     Ok, Gergo and I have been working through the two issues he reported
>     and with kind access to his machine I have figured out what is going
>     on.  I am going to post this to the thread so we have a record of it,
>     because it is quite interesting.
> 
>     Gergo found two problems:
> 
>     (1) 'hammer reblock' can lose track of space reservations and cause
> 	hammer_alloc_data() to run out of space on the media and panic.
> 
> 	I hope to fix this today.  At worst we want the reblocker to
> 	return an error if there is insufficient free space on the disk
> 	to reblock it, not panic the machine :-).
> 
> 	I have found that 'dd' can do the same thing.  It is the same bug.
> 
>     (2) On his small 14G test partition, using nohistory, it turns out
> 	that a huge amount of fragmentation can build up if the partition
> 	is not reblocked.  I aint talking 10% here, I'm talking 65%
> 	fragmentation or worse.  The 14G partition only had 5G worth
> 	of files on it but it was 99% full, with only 300MB free in df.
> 
> 	It was so fragmented that trying to reblock it using the default
> 	fill level (aka 'hammer reblock /home') failed because there was
> 	not enough media space free to reblock into.
> 
> 	This is really a documentation issue.  HAMMER partitions must be
> 	reblocked occassionally, preferably via cron and preferably before
> 	you actually run out of disk space. 

Er. Documentation issue? Selecting a 'perfect' line in crontab so that you
stay ahead of fragmentation is hard to do. I'd argue it shouldn't be left
up to the admin. At some point the system should take matters into its own
hands. I mean, if the admin can schedule reblocking so that things never
get out of hand great, but let's be reasonable: that is not going to happen.

>     It is possible to reblock when the media is highly fragmented.  You do
>     it by telling hammer to only reblock nearly-empty blocks first, in
>     order to get them freed up and available for reuse as quickly as possible.
>     This is done by specifying a <fill_percentage> argument.
> 
>     For example, this command will only reblock blocks that are 5% full
>     (and hence 95% empty):
> 
> 	hammer reblock /home 5
> 
>     It works because it doesn't cost much to move the small amounts of data
>     out of those highly fragmented blocks and thus be able to free the blocks.
>     You then increase the fill percentage until you have freed enough space
>     to do the remainder with no limitations:
> 
> 	hammer reblock /home 25
> 	hammer reblock /home 50
> 	hammer reblock /home 75
> 	hammer reblock /home 90
> 	hammer reblock /home
> 	whew...
> 
>     It's a bit complex so what I am going to do is add some foot-shooting
>     protection to the 'hammer reblock' utility and maybe also have it
>     print out a warning, a reminder, if you attempt to reblock a hammer
>     partition that is too full.

IMHO, that's a job that the machine should do. Why not create a thread/daemon
that babysits a hammer fs, and just let the user choose among a set of
predefined policies? This should take care of both issues.

As things are, I fear people are going to hit some of these issues even if
they do read the docs; you just ask too much of them. At the very least, we'd
need a Getting Started With HAMMER doc that explains what the admin should do
if they run into trouble and guides them through setting up an appropriate
crontab line. Another idea would be to warn about all known hammer issues
*in the release announcement*.

Also, the rate of hammer bug reports does not appear to have slowed down this
week. From where I'm standing there are two unresolved bugs and we're 5 days
before the release. I know you've been testing hammer to death but you can't
test everything. People who try the release /are/ going to hit bugs (that's
the point, right?). Perhaps there should at least be a warning about
backups at mount time (or require -o I_have_backups :)?

I do not intend to sound discouraging; I'm just worried that the cries of
those who have hit the reblocking issues and/or some stray bugs are going
to cover the positive feedback.

Aggelos



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]