DragonFly BSD
DragonFly commits List (threaded) for 2005-03
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: cvs commit: src/sys/sys tls.h src/lib/libc/gen tls.c src/lib/libthread_xu/arch/amd64/amd64 pthread_md.c src/lib/libthread_xu/arch/i386/i386 pthread_md.c src/libexec/rtld-elf rtld.c rtld.h rtld_tls.h src/libexec/rtld-elf/i386 reloc.c


From: Joerg Sonnenberger <joerg@xxxxxxxxxxxxxxxxx>
Date: Mon, 28 Mar 2005 19:23:21 +0200
Mail-followup-to: commits@crater.dragonflybsd.org

On Mon, Mar 28, 2005 at 09:15:28AM -0800, Matthew Dillon wrote:
> :I'd like to get rid of the size argument too. This should be split into
> :machine/tls.h (with e.g. the struct tcb define) and sys/tls.h with the
> :general system call.
> 
>     This is viable, but I think it might be best to retool it so the thread
>     library has full control over the size of the TCB rather then hardwire
>     it into the OS headers.

There is _no_ reason for the thread library to extend the TCB.
Anything which a thread library might want to store there can also be
stored in the pthread structure, which is completely managed by the
library. There are still differences between architectures in what
to place in the TCB, because the "self" pointer is only needed for
segment-style implementations, if the TCB register contains a normal
pointer (like e.g. on IA64 or any RISC architecture), it is not
needed.

> :>   * Retools the Variant II code to support %gs:OFFSET (negative offset)
> :>     AND %gs:0 relative accesses, supporting both -mtls-direct-seg-refs and
> :>     -mno-tls-direct-seg-refs.
> :
> :As Doug mentioned, this means m:n implementation has to do a syscall for
> :thread switched.
>     
>     Yup, or a linux-like kernel-supported thread switch (which I would
>     prefer NOT to do).  I did a quick timing test on sys_set_tls_area()
>     and it costs around 339ns on my AMD64 test cube.  But this is still
>     going to be far higher performing then having to call __tls_get_addr
>     all the time.  The procedure setup cost for figuring out the GOT offset
>     alone is 17ns on the same box.

It's not about calling __tls_get_addr, but
	mov %gs:0, %eax
	mov a@NTPOFF(%eax), %eax
vs.
	mov $gs:a@NTPOFF, %eax

The difference is one load instruction with possible a pipe-line stale
involved here. The difference should be zero once the base register is
loaded.

>     I think this problem goes away in 64 bit mode.  It's a little confusing,
>     I don't know why they made the segment load instructions only 32 bits,
>     but it appears you can load a 32 bit base address into %fs or %gs from
>     user mode.

I haven't looked at AMD64 yet.

> :>   * Retains the TCB methodology, but note that the area 'after' the tcb
> :>     is now available for future use (at least with Variant I removed).
> :>     Frankly I'm not sure we would ever want to support having the
> :>     'data' area after the TCB instead of before the TCB, at least not
> :>     for i386.
> :
> :Placing the TLS area after the TCB solves a lot of nasty problems :)
> :
> :Joerg
> 
>     It would appear to be cleaner, but it's not worth doing if we have
>     to make serious hacks to the compiler to support it.

s/compiler/linker/

Joerg



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]