DragonFly kernel List (threaded) for 2005-02
Re: rc and smf
:Matthew Dillon wrote:
:> Hmm. Well, I have to say that in my opinion a service failure is a
:> critical bug in the application. I usually go in and fix the application
:Nobody argues this. Again, this is one of the reasons why people
:supervise in the first place. There's nothing stopping you to add
:an alert feature to a supervisor.
:> software rather then write monitoring programs for it (other then to
:> tell me if it has failed). Most service oriented applications fork()
:> on connect (a DNS cache being an exception), and those that have the
:Nothing stops the parent process of your forked children to be killed or
:crashed, obviously for some reasons already discussed.
Be killed ... by what? Crashing ... due to what? The problem here
is that you are just throwing out examples without paying any attention
to the likelihood that the issue might actually occur under normal
(or even exceptional) system operation. It's like you don't trust that
a for(i = 0; i < 10; ++i) loop will actually count properly and you want
to protect against it possibly not counting properly.
You are saying "what if" instead of "how often". Just because something
might POTENTIALLY happen doesn't mean that it WILL happen or that it will
happen often enough to warrent protection or that it will EVER happen
in the particular environment you are trying to protect. People get hit
by lightning all the time but that doesn't mean we wear a faraday cage
jacket every time we go outside! Hard drives fail all the time, but
most consumer systems still ship with just one. And, frankly, it's far
more likely that your RAID storage system will fail then many of the
things you are pulling out as examples.
I don't bother putting a crash monitor on sendmail and apache because,
well, sendmail hasn't actually crashed on me for at least 20 years,
and apache hasn't crashed on me since I used it. Slow down, yes.
Get behind on the queues, yes. Have a CGI/backend database failure,
absolutely. But the primary connection accepting server actually
crash? Hasn't happened.
If I want my apache server to be robust I write a monitoring program
that runs on an entirely DIFFERENT machine, and doesn't just test
whether the connection works, but actually goes in and issues a real
query that exercises the most complex CGI/database path I can find,
and screams bloody hell if that fails.
Dan, we could argue what-if's all day long, because there are an
infinite number of what-if scenarios. It's like pulling a rabbit out
of your hat. The problem is that just throwing out these scenarios
doesn't actually help anyone running a REAL production server. You
are trying to solve problems that you don't have rather then trying
to solve the problems that you do have. That's the real issue here.
Now, a lot of people on these lists, including me, have tried to explain
this to you, but you aren't seeming to get it. You are still focusing
on what-if scenarios that might occur once a decade or not at all instead
of solving the REAL problem facing you, which in the case of that
mail proxy service is simply configuring the program to limit the
number of simultanous connections it can handle. And if it doesn't
have such a configuration option, then it's broken and you should either
fix it or replace it with something better. It's that simple. You
don't need overcommit, you don't necessarily need service monitoring.
If the program is otherwise reliable you just need a simple