DragonFly BSD
DragonFly kernel List (threaded) for 2005-02
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: rc and smf

From: Chris Pressey <cpressey@xxxxxxxxxxxxxxx>
Date: Thu, 24 Feb 2005 12:16:07 -0800

On Thu, 24 Feb 2005 14:12:46 -0500
Dan Melomedman <dan@xxxxxxxxxxxxxxxx> wrote:

> You don't see the point. It takes a long time to fix the fault. BSD
> has nothing to do with this. The real world does. You don't want a
> nuclear reactor to explode because it took an admin five minutes to
> notice the fault, and restart the service.
> Another example: a telecom can't afford to lose service in some of the
> systems even for mere seconds. They lose thousands of dollars. This is
> exactly why Erlang, the language originally designed with telecom
> requirements in mind has supervision in its feature set! When you make
> a call in the UK, it runs through an Ericsson switch running Erlang
> that supervises its processes, and restarts them if they fail. Again,
> supervision may be new to some people on this list, but it isn't
> anything new or detrimental.

There's some distinctions to be made here, though:

- A strictly fault-tolerant system either needs to be provably reliable
(in a mathematical sense), or it needs a supervisor (which itself must
be provably reliable.)

- Not everyone needs a fault-tolerant system.  Or rather, different
people need different degrees of fault-tolerance.  Most people don't
need telecom-level reliability.

- Many daemons implement some form of supervision themselves.  Much of
the 'djb regime' is not actually new, it just tries to commodify
concepts such as supervision and daemonization at the operating system
level, rather than having every program do it themselves.

- Erlang's concurrency is typically much more fine-grained; most Erlang
processes are not daemons in the usual sense (they only ever service
each other rather than the outside world.)  The programming paradigm in
this case is also different; because supervision guarantees have already
been made, failure is "acceptable", and many processes are written in a
"let it crash" style.  This simplifies error handling immensely in many
cases, BUT it's most practical when working with lightweight processes
(basically threads).  It's not nearly as effective a programming style
when working with operating system processes.


[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]