DragonFly BSD
DragonFly submit List (threaded) for 2004-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: [PATCH] Suggested FreeBSD merge


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 15 Nov 2004 10:24:18 -0800 (PST)

:The address was just a value off-hand. I think we can differentiate between
:(a) the application ABI and (b) the kernel version to be mapped.
:
:>From the application point-of-view, having a fixed address is very useful,
:because it allows the compiler to skip the overhead of Position Independent
:Code, esp. the GOT/PLT setup. Since this should be used for sensitive
:low-level routines, it makes sense to skip this.
    
    I'm not sure I understand what you mean here.  I see only three ways to do
    this.  Using strlen() as a contrived example.  The first way I don't
    think we can do because it makes strlen() a function pointer rather then
    a function.  It would be something like:

	#define __section(name) __attribute__((__section__(name))) 

	__section(".klib-dragonfly01") size_t (* const strlen)(const char *);

    This would generate code as follows.  This code would be AS FAST as a
    direct jump due to the branch prediction cache.  That is, the 
    movl strlen,%ebx + call combination will take no longer then call strlen
    would take.

	movl strlen,%ebx
	call *%ebx

    However, I don't think we can use a C declared function pointer and still
    adhere to the standards unless the procedures are typically #define'd
    entities in standard header files.


    A second way of doing this is a call/jump:

	(strlen would be at a fixed offset within the special section)

	.section	.klib-dragonfly01,"ax",@progbits
	.globl		strlen
	.type		strlen,@function
strlen:
	jmp		clib_strlen	; default overrided by kernel

    The kernel would modify the jump address.  i.e. it would change it from
    whatever address 'clib_strlen' was to point into its shared map.  
    However, this is MUCH slower then an indirect call because it forces the
    cpu to resynchronize the instruction stream twice.

:A good place to request to loading of this page[s] is libc. That way the
:linker can be told that the symbol is part of the libc namespace and using
:some magic, the compiler can be made aware of the fixed nature [for shared
:libraries]. The location of the page can be arbitrary, even 0x0 would make
:sense. Since this is part of the namespace of libc, it is bound by the
:ABI version of libc, so no additional compatibility problems should arise.
:If a library doesn't want to depent on this, it can use the normal indirect
:calls via GOT/PLT.

    Sure, we could compile up our 'shared' library and then make the linker
    aware of the symbol map, but that means that *EVERY* *TIME* we want 
    to modify the shared library every single program that uses it would
    have to be recompiled.  Or, if not recompiled, we would have to keep
    a copy of every version of the shared library that we ever wrote.

    Not only that, but different compiler options would produce different
    code, causing the offsets to change even without any code changes.

    I just don't see this being viable generally without some significant
    work.  The only way I see a direct-call model working is if the 
    direct-call code reserved a fixed amount of space for each function
    so the offsets are well known, and if the function is too big to fit
    in the reserved space the space would be loaded with a JMP to the
    actual function instead.

    So the THIRD way would be to do this:

	.section	.klib-dragonfly01,"ax",@progbits
	.globl		strlen
	.type		strlen,@function
strlen:
	[ the entire contents is replaced with actual function if the actual
	  function does not exceed 64 bytes, else only the jump vector is
	  modified ]
	[ the default function can be placed here directly if it does not
	  exceed 64 bytes ]
	jmp		clib_strlen	; default overrided by kernel
	.p2align	6,0x90		; 64 byte blocks

    Advantages: 

	* Direct call, no jump table for simple functions.

	* The kernel can just mmap() the replacement library right over the
	  top.

    Disadvantages:

	* requires a sophisticated utility to check whether the compiled
	  function fits and decide whether to generate a jmp or whether
	  to directly embed the function.

	* space/compactness tradeoff means that the chosen size may not
	  be cache friendly, or may be space friendly, but not both.


:The jump table is not the problem, the problem having to resolve the
:references to it. For the code page itself, .TEXT relocations are not
:critical, that can be handled easily and with low overhead. Just to
:clarify, I mean calls / jumps from code in the code page to itself.
:It has to be self-contained, of course.
:
:If we have to use a jump table from a variable address, it adds at least
:two instructions to every reference [as variable] or call [as function].
:This can easily out-weight the performance improvements.
:
:Joerg

    I'm not sure I understand what you are describing here relative to
    what I am describing.  I was not describing PIC, per-say.  The only way
    to have a direct-call model is if an absolute, static amount of function
    space is reserved for each procedure.

					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]