DragonFly users List (threaded) for 2009-02
[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]

Re: BFBI OTT bug or limitation? UPDATE2

From:	Bill Hacker <wbh@xxxxxxxxxxxxx>
Date:	Mon, 23 Feb 2009 18:37:14 +0800

Bill Hacker wrote:

Top-posting to my own post ...

Again.

Reproduced the original verbose crash. Only the last line is the same as below.

Failed to set up the HP-200LX serial, so will run it again...

Bill

:-(

du -h > dulist

Two more runs, one OK with hammer mirror-stream over ssh NOT running, second run with it mirroring a nearly empty dirtree (five static, one-line text files only), runs for several minutes, then drops into debugger with a mere three lines, rather than the original scrolled-off-the screen;
CRC DATA @ 9000000a3b15b280/128 FAILED
Debuger ("CRCFAILED: DATA")
Stopped at    Debugger+0x34:    movb    $0, in_Debugger.3970
But this does not seem to relate. Could be an NATACONTROL + old HHD I/O error artifact.

First looong err message had to do with vnodes..

More testing to do, then will swap-in a newer 500 GB SATA.

Bill

Bill Hacker wrote:
Brute Force and Bloody Ignorance is Over The Top when;

'du -h'

saves keystrokes over:

'shutdown now'

Likewise a few other ways to either reboot or drop into the debugger.

Environment:

Target box:

VIA C7, 2 GB DDR2-533

2 X IBM 60GB PATA as NATACONTROL RAID1

2.3.0. 'default install' to all-HAMMER

root-mounted /slave1 PFS-created as such by hammer mirror-copy over ssh

Source box:

Lenovo G400 3000 laptop. 33GB slice ad0s1 for DFLY.

2.2.0 Installed to UFS, spare 8 GB partition *ONLY* later formatted and mounted as hammerfs '/hmr', made into a master for testing, '/hmr/master'.

ACTION:
hammer mirror-stream /hmr/master Thor@<target_IP>/slave1
over ssh 100 Mbps internal link.
Fire-off bonnie++ to fill the /hmr partition with fairly deep recursion. It fills and stops gracefully with a ...cannot write' message, begins to cleanup its work area.

meanwhile, the ssh link has been doing its best - and it's best is very good.

Watching the target sees ~/slave1 gradually clear as bonnie++ mops up the master, until back where du shows zero usage, slaves having no snapshots of their own.

But the /hmr/master mount has gone from zero to 94% used, and the target has gone from 76% used to 87% used.

'du' on the master cannot seem to locate where TF the '94%' df reports for /hmr is hiding, but never mind.. we can nuke and newfs that partition at will.

But where is the used space on the *target* hiding?

'du -h /' on the all-hammerfs target *reboots* it soemwhere along the way.

... Comes back up quick - I'll give it that...

But hang-on. Could an 'ordinary user do that at will?

'du -h > dulist' (for later grep'ing) throws a panic and drops DFLY into the debugger...

Also worrisome...

By comparison, a UFS fs when overloaded, ordinarily soldiers on with 109% utilized and a 'no space on device' message. For days....

Hammer needs to get there also...

If this is an out-of-memory situation with 2GB, it shouldn't be.

If the fs is full, the exit should be graceful, not catastrophic.

If no one else can reproduce this, I'll try it on other hardware - and with a serial terminal.

NB: Rather small drives and partitions used. /hmr/master 8GB, entire hammer fs on target only 60 GB.

That part is intentional.

No need to wait all day to see if it happens on a half-terrabyte also.

Panic not captured. Do we need it, or is this a known issue?

Bill

Follow-Ups:
- Re: BFBI OTT bug or limitation? UPDATE2
  - From: Matthew Dillon <dillon@apollo.backplane.com>

References:
- BFBI OTT bug or limitation?
  - From: Bill Hacker <wbh@conducive.org>
- Re: BFBI OTT bug or limitation? UPDATE
  - From: Bill Hacker <wbh@conducive.org>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index]