DragonFly BSD
DragonFly kernel List (threaded) for 2004-06
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: postfix causes hangs

From: Rob Schmuloff <rrschoolie@xxxxxxxxx>
Date: Sat, 26 Jun 2004 20:02:33 -0700 (PDT)

Hello Matt,

  First apologies for the poor formatting of this

   I have some interesting information regarding the
postfix problem.  It seems to affect a few of the
postfix daemons: smtpd, local, and cleanup. The
debugging output seems to indicate a race condition
between two processes calling flock and requesting an
exclusive lock on the same file.  When the machine
wedges the network stack still seems to be working (
i.e. TCP connect, and pings), but the console is
locked up and nothing seems to be running in userland.
 I can reproduce this probelm by receiving email from
the freebsd-current mailing list ( not sure why other
than the high volume of mail, and their server is
running postfix too).

The debugging output from the lockf code (hand

 pid 5831 (cleanup) lf_destroy_range:
pid 5831 (cleanup) lf_create_range:
pid 5832 (cleanup) lf_destroy_range: 
pid 5832 (cleanup) lockf 0xd7c22bd4                   
        0..9223372036854775807 type exclusive owned by
blocked locks 9223372036854775807 type exclusive
waiting on 0xc0861ec0

pid 5832 (cleanup) lf_destroy_range:
pid 5832 (cleanup) lf_create_range:
pid 5831 (cleanup) lf_destroy_range: 
pid 5831 (cleanup) lockf 0xd7c22bd4
        0..9223372036854775807 type exclusive owned by
blocked locks 9223372036854775807 type exclusive
waiting on 0xc0861ec0

I wrote a quick prog that seems to replicate the
problem if run with stdout/stderr redirected to
/dev/null ( or just comment out the printf's)

#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/file.h>
#include <sys/types.h>

int main()

   int fd,status;
   int ops[4]={ LOCK_SH, LOCK_EX, LOCK_SH|LOCK_NB,
   char *opstr[4] ={"LOCK_SH", "LOCK_EX",
   pid_t pid, i;

   /* i = pid % 2; */
   i = 1;
   printf("pid %d i = %d\n",pid,i);
   for (;;){        
        fd = open("testfile",O_RDWR|O_CREAT);
        while ( status=flock(fd, ops[i]) )
             usleep( rand()/20000);
        printf("PID %d got %s lock --
        status=flock(fd, LOCK_UN);
        printf("PID %d released lock --
        close (fd);


--- Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
>     Rob, I have committed a change to
> kern/kern_lockf.c that puts a very
>     short wait in the lockf retry loop in an attempt
> to prevent the system
>     from locking up when it gets into this livelock.
>     I don't think this will actually fix the problem
> (or at least I hope
>     it doesn't).  I am instead hoping that it will
> be possible to ktrace
>     the processes involved and/or otherwise track
> the problem down when
>     it occurs without the whole machine going down.
> 						-Matt
> :Hello,
> :
> :  I've experienced periodic hangs on my system for
> :sice mid-May.  Lately, my terminal is unresponsive
> as
> :soon as I start postfix and receive incoming mail. 
> :When I drop into DDB,  the stack trace shows:
> :
> :scgetc()
> :sckbdevent()
> :atkbd_intr()
> :atkbd_isa_intr()
> :intr_mux()
> :ithread_handler()
> :(Perhaps this is  ctl-alt-esc trace)
> :
> :This is with a kernel/world built June 23rd.  I
> :*think* the problem is somewhere in the lockf code
> :because I glanced at 'ps' from DDB. Also, I'm using
> :procmail for local delivery, so that's also a
> :possibility.  Sorry for the sparse information. 
> I'm
> :trying to get more data for you..
> :
> :Thanks,
> :
> :Rob

Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!

Attachment: tt.c
Description: tt.c

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]