DragonFly bugs List (threaded) for 2008-07
DragonFly BSD
DragonFly bugs List (threaded) for 2008-07
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: panic: already on hash list


From: "Sepherosa Ziehau" <sepherosa@xxxxxxxxx>
Date: Wed, 23 Jul 2008 21:04:02 +0800

On Tue, Jul 22, 2008 at 8:28 PM, YONETANI Tomokazu <qhwt+dfly@les.ath.cx> wrote:
> Hi.
> Caught this panic while playing with my build box.  I don't know the exact
> moment when it panicked, as the monitor was connected to another box, but
> here's what I was doing before the panic anyway:
>
> - login to the build box from the console, and slogin to the router box
>  (running DragonFly 1.8), and rebooted it.
> - switch the monitor/keyboard to the router box, wait for it to boot,
>  and login to the console.  slogin to the build machine, start GNU screen,
>  start w3m (a text-based web browser, similar to lynx), tried to visit
>  Google but it didn't work (this was expected, because for some reason mpd
>  starts up earlier than ipnat and I always have to restart before I can
>  connect to the Internet from behind the router).
> - split the screen inside GNU screen, and typed ctrl+C on w3m.  A few seconds

I think this ctrl+C is important to the problem :)

One thing I need you to help confirm is that does w3m put the socket
into nonblock mode?  I didn't seem to be able to download w3m source
code.

I think it may be caused by following pattern of user code:

s = socket();
/* s is not put into nonblock mode */
while (1) {
  if (connect(s) < 0) {  <==== here you hit ctrl+C
    if (errno == EINTR)
      continue; <==== another connect(2) attempt on 's'
    ...
  }
  ...
}

We probably could create a much simpler test program by using the
above code pattern to reproduce the panic ...

The things from the dump related to my following assumption are:
1) so_state is 0
2) inpcb is on hash list (both the flag and the link fields prove that)

I think following things happened, if w3m used the code pattern I listed above:
- connect(2) is blocking, so first calling of connect(2) will make
kern_connect() block on lwkt_domsg()
- ctrl+C will make the lwkt_domsg() in kern_connect() return.
SS_ISCONNECTING is cleared on so_state, then so_state becomes 0, but
inpcb is left on hash list since former soconnect() succeeded.
- the second connect(2) syscall hits the wall (soconnect calls
so_pru_connect, since so_state is 0)

I will appreciate if you could reproduce the panic by using the user
code I mention above.  I probably could not do it today, I could not
go back home before 10pm today :(

Best Regards,
sephe

-- 
Live Free or Die



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]