DragonFly kernel List (threaded) for 2005-02
Re: phk malloc, was (Re: ptmalloc2)
0428.GO7819@xxxxxxxxxxxxxxxx> <200502251055.j1PAtTwK031301@xxxxxxxxxxxxxxxxxxxx> <4220ccca$0$718$415eb37d@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20050226152833.179e2301.cpressey@xxxxxxxxxxxxxxx> <ef91b8b4828dbe303d608e728cfda319@xxxxxxxx> <20050226171614.2a712851.cpressey@xxxxxxxxxxxxxxx>
Content-Type: text/plain; charset=us-ascii; format=flowed
X-Trace: 1109532946 crater_reader.dragonflybsd.org 717 220.127.116.11
Xref: crater_reader.dragonflybsd.org dragonfly.kernel:7854
Chris Pressey wrote:
> On Sat, 26 Feb 2005 19:32:58 -0500
> Tobias DiPasquale <toby@xxxxxxxx> wrote:
>>-----BEGIN PGP SIGNED MESSAGE-----
>>On Feb 26, 2005, at 6:28 PM, Chris Pressey wrote:
>>>>And the point we keep coming back to is that it is impossible for
>>>an > application to accurately self regulate its resource usage
>>>(unless you > mean allowing command line flags to specify how much
>>>memory to use > [why not just set rlimits instead]) since it does
>>>not receive > accurate feedback from the kernel when over commit is
>>> [EAGAIN] Locking the indicated range would exceed either
>>> system or per-process limit for locked memory.
>>>Is that not accurate feedback?
>>Read more closely: "limit for __locked__ memory". The limits don't
>>have to be (and frequently aren't) the same.
> Quoth POSIX:
> "Memory residency of unlocked pages is unspecified."
> Unspecified means they might be in core, they might be on disk, or they
> might not even exist - and is this not the precise nature of overcommit?
Let's say process 1 calls malloc() and gets a block of virtual address
space (no physical allocation is done and no accounting of physical
pages is done). Let us then say process 2, which is really big, does a
fork() and the child now has a copied virtual address space of process 2
and shares the physical pages with its parents which are copy-on-write.
At this point, there may not be enough pages in swap and memory to
satisfy all three processes, if they actually start causing faults in
the virtual address spaces they now have (process 1 and the child of
process 2). If the system runs out of free pages (swap and physical
memory are exhausted) due to these page faults, one (the only widely
implemented) solution is to kill one or more processes to free memory.
Usually the process causing the page fault is killed. Bear in mind that
previously running processes would most likely have lots of their pages
paged out (because memory would be running low) and may in fact generate
page faults at this time. The process selection is not deterministic
from what I have read and seen (your shell, a getty or even init or
inetd can get killed).
Over commit allows you to use the maximum amount of memory, but it also
allows over utilization. If you disallow over commit, one consequence is
that big processes may not be able to fork() small tasks because the
worst case memory needs may exceed what the system can provide. This is
why some people insist that providing an option (not the default mind
you) to disable over commit is "nutty" because the worst case seldom
occurs. It would involve denying some fork()'s and malloc() requests
even if in fact the virtual memory will never be accessed at all (the
system cannot tell before hand which pages you don't plan to recycle
ever and which you do plan to overwrite). In my opinion, over commit is
a result of ambiguity: requests for memory are not always needed and are
a form of redundancy. If this ambiguity were removed, a system which did
not allow over committal would still allow you to run your machine at
the maximum possible utilization (not necessarily 100%).
[One solution for the case of fork() is to use vfork(): a deprecated
function intended to be used with exec() in which most of the address
space in the child is read only and this would is theory help out large
processes that need to start tasks without over commit (however,
nowadays vfork() is just a synonym for fork() except on one or two
systems). Since unix only has the fork()/exec() path to start new
processes, this is what you are stuck with.]
Basically, there is no free solution. If you want reliability in a
system that uses over commit, you would have to manually set resource
limits on basically ever process and make a static allocation of the
resources you have and/or buy much more hardware than you need for the
tasks most of the time (i.e. plan for the worst case). If you disable
over commit, your machine will dynamically assign resources and should
never start killing processes, but it may not allow 100% utilization
(but I would bet that it is better than the first alternative). Both
result in under utilization of your machine and hence wasted money. The
question is: if you need the reliability, which costs less? If you don't
care or don't need the reliability, obviously you want the one which
works faster most of the time (over commit).