DragonFly BSD
DragonFly commits List (threaded) for 2005-03
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: cvs commit: src/sys/sys tls.h src/lib/libc/gen tls.c src/lib/libthread_xu/arch/amd64/amd64 pthread_md.c src/lib/libthread_xu/arch/i386/i386 pthread_md.c src/libexec/rtld-elf rtld.c rtld.h rtld_tls.h src/libexec/rtld-elf/i386 reloc.c


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 28 Mar 2005 09:15:28 -0800 (PST)

:I'd like to get rid of the size argument too. This should be split into
:machine/tls.h (with e.g. the struct tcb define) and sys/tls.h with the
:general system call.

    This is viable, but I think it might be best to retool it so the thread
    library has full control over the size of the TCB rather then hardwire
    it into the OS headers.

:
:>   * Gets rid of the Variant I code (we can add it in later, it just gets
:>     in the way).
:
:That's why I wanted to use Variant I always. For the archive, there is one
:nasty thing -- statically linked binaries. For those, ld itself does the
:relocation and therefore it would have to be changed to go directly to
:positive offsets.

    I would say that this is not a viable option.  Using positive offsets
    locks the program into a particular TCB size.  This may not matter so
    much for static programs, but it is information that would have to be
    communicated to the linker through some out-of-band method (like via
    the linker map) and that seems a bit too hackish for my tastes.

    For dynamically linked programs it is out of the question... the
    rtld and libc can statically code the size of the TCB, but the
    program binary cannot because it would tie our hands from an ABI
    point of view.

:>   
:>   * Retools the Variant II code to support %gs:OFFSET (negative offset)
:>     AND %gs:0 relative accesses, supporting both -mtls-direct-seg-refs and
:>     -mno-tls-direct-seg-refs.
:
:As Doug mentioned, this means m:n implementation has to do a syscall for
:thread switched.
    
    Yup, or a linux-like kernel-supported thread switch (which I would
    prefer NOT to do).  I did a quick timing test on sys_set_tls_area()
    and it costs around 339ns on my AMD64 test cube.  But this is still
    going to be far higher performing then having to call __tls_get_addr
    all the time.  The procedure setup cost for figuring out the GOT offset
    alone is 17ns on the same box.

    I think this problem goes away in 64 bit mode.  It's a little confusing,
    I don't know why they made the segment load instructions only 32 bits,
    but it appears you can load a 32 bit base address into %fs or %gs from
    user mode.

:...
:>   * Retains the DTV methodology.
:>   
:>   * Retains the TCB methodology, but note that the area 'after' the tcb
:>     is now available for future use (at least with Variant I removed).
:>     Frankly I'm not sure we would ever want to support having the
:>     'data' area after the TCB instead of before the TCB, at least not
:>     for i386.
:
:Placing the TLS area after the TCB solves a lot of nasty problems :)
:
:Joerg

    It would appear to be cleaner, but it's not worth doing if we have
    to make serious hacks to the compiler to support it.

					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]