DragonFly BSD
DragonFly kernel List (threaded) for 2005-02
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: phk malloc, was (Re: ptmalloc2)

From: Gary Thorpe <gathorpe79@xxxxxxxxx>
Date: Sun, 27 Feb 2005 14:39:05 -0500

0428.GO7819@xxxxxxxxxxxxxxxx>	<200502251055.j1PAtTwK031301@xxxxxxxxxxxxxxxxxxxx>	<4220ccca$0$718$415eb37d@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>	<20050226152833.179e2301.cpressey@xxxxxxxxxxxxxxx>	<ef91b8b4828dbe303d608e728cfda319@xxxxxxxx> <20050226171614.2a712851.cpressey@xxxxxxxxxxxxxxx>
In-Reply-To: <20050226171614.2a712851.cpressey@xxxxxxxxxxxxxxx>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 94
Message-ID: <42222111$0$717$415eb37d@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
X-Trace: 1109532946 crater_reader.dragonflybsd.org 717
Xref: crater_reader.dragonflybsd.org dragonfly.kernel:7854

Chris Pressey wrote:
> On Sat, 26 Feb 2005 19:32:58 -0500
> Tobias DiPasquale <toby@xxxxxxxx> wrote:
>>Hash: SHA1
>>On Feb 26, 2005, at 6:28 PM, Chris Pressey wrote:
>>>>And the point we keep coming back to is that it is impossible for
>>>an > application to accurately self regulate its resource usage
>>>(unless you > mean allowing command line flags to specify how much
>>>memory to use > [why  not just set rlimits instead]) since it does
>>>not receive > accurate  feedback from the kernel when over commit is
>>>man mlock(1):
>>>     [EAGAIN]      Locking the indicated range would exceed either
>>>     the
>>>                   system or per-process limit for locked memory.
>>>Is that not accurate feedback?
>>Read more closely: "limit for __locked__ memory". The limits don't
>>have  to be (and frequently aren't) the same.
> Quoth POSIX:
> "Memory residency of unlocked pages is unspecified."
> Unspecified means they might be in core, they might be on disk, or they
> might not even exist - and is this not the precise nature of overcommit?
> -Chris

Let's say process 1 calls malloc() and gets a block of virtual address 
space (no physical allocation is done and no accounting of physical 
pages is done). Let us then say process 2, which is really big, does a 
fork() and the child now has a copied virtual address space of process 2 
and shares the physical pages with its parents which are copy-on-write. 
At this point, there may not be enough pages in swap and memory to 
satisfy all three processes, if they actually start causing faults in 
the virtual address spaces they now have (process 1 and the child of 
process 2). If the system runs out of free pages (swap and physical 
memory are exhausted) due to these page faults, one (the only widely 
implemented) solution is to kill one or more processes to free memory. 
Usually the process causing the page fault is killed. Bear in mind that 
previously running processes would most likely have lots of their pages 
paged out (because memory would be running low) and may in fact generate 
page faults at this time. The process selection is not deterministic 
from what I have read and seen (your shell, a getty or even init or 
inetd can get killed).

Over commit allows you to use the maximum amount of memory, but it also 
allows over utilization. If you disallow over commit, one consequence is 
that big processes may not be able to fork() small tasks because the 
worst case memory needs may exceed what the system can provide. This is 
why some people insist that providing an option (not the default mind 
you) to disable over commit is "nutty" because the worst case seldom 
occurs. It would involve denying some fork()'s and malloc() requests 
even if in fact the virtual memory will never be accessed at all (the 
system cannot tell before hand which pages you don't plan to recycle 
ever and which you do plan to overwrite). In my opinion, over commit is 
a result of ambiguity: requests for memory are not always needed and are 
a form of redundancy. If this ambiguity were removed, a system which did 
not allow over committal would still allow you to run your machine at 
the maximum possible utilization (not necessarily 100%).

[One solution for the case of fork() is to use vfork(): a deprecated 
function intended to be used with exec() in which most of the address 
space in the child is read only and this would is theory help out large 
processes that need to start tasks without over commit (however, 
nowadays vfork() is just a synonym for fork() except on one or two 
systems). Since unix only has the fork()/exec() path to start new 
processes, this is what you are stuck with.]

Basically, there is no free solution. If you want reliability in a 
system that uses over commit, you would have to manually set resource 
limits on basically ever process and make a static allocation of the 
resources you have and/or buy much more hardware than you need for the 
tasks most of the time (i.e. plan for the worst case). If you disable 
over commit, your machine will dynamically assign resources and should 
never start killing processes, but it may not allow 100% utilization 
(but I would bet that it is better than the first alternative). Both 
result in under utilization of your machine and hence wasted money. The 
question is: if you need the reliability, which costs less? If you don't 
care or don't need the reliability, obviously you want the one which 
works faster most of the time (over commit).

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]