Yes, by "concurrent modifications" and "shared variable write" I mean
concurrent writes. Not reads, of course.
On Tue, Jan 3, 2012 at 5:48 PM, Brandon Black wrote:
>
>
> On Tue, Jan 3, 2012 at 5:02 AM, Yaroslav wrote:
>>
>> Interesting observation: removing __thread storage class makes thread
>
On Tue, Jan 3, 2012 at 5:02 AM, Yaroslav wrote:
>
> Interesting observation: removing __thread storage class makes thread data
> shared by all threads. Even without any locks concurrent modifications of
> the same memory area result in 5-10 fold test time increase. I.e., shared
> variable write is
>
> > (4.1) Does TLS impose extra overhead (performance cost) compared to
> regular
> > memory storage? Is it recommended to use in performance-concerned code?
>
> It requires an extra indirection. The app has to:
> 1. Figure out which thread it's currently running on.
> 2. Look up the location of
On Mon, Jan 2, 2012 at 11:49 PM, Colin McCabe wrote:
> The problem is that there's no way for the programmer to distinguish
> "the data that really needs to be shared" from the data that shouldn't
> be shared between threads. Even in C/C++, all you can do is insert
> padding and hope for the best
On Mon, Jan 2, 2012 at 4:49 PM, Colin McCabe wrote:
>
> The problem is that there's no way for the programmer to distinguish
> "the data that really needs to be shared" from the data that shouldn't
> be shared between threads. Even in C/C++, all you can do is insert
> padding and hope for the bes
On Mon, Jan 2, 2012 at 3:20 AM, Yaroslav wrote:
>> As far as I know (correct me if I'm wrong), when you execute a CPU
>> instruction that writes to a memory location that's also cached by
>> another CPU core, will cause the system to execute its cache coherence
>> protocol. This protocol will inva
>
> As far as I know (correct me if I'm wrong), when you execute a CPU
> instruction that writes to a memory location that's also cached by
> another CPU core, will cause the system to execute its cache coherence
> protocol. This protocol will invalidate the cache lines in other CPU
> cores for thi
On Mon, Jan 2, 2012 at 10:29 AM, Yaroslav wrote:
> (2.1) At what moments exactly this syncronizations occur? Is it on every
> assembler instruction, or on every write to memory (i.e. on most variable
> assignments, all memcpy's, etc.), or is it only happening when two threads
> simultaneously work
Hi,
On Mon, Jan 02, 2012 at 01:29:39PM +0400, Yaroslav wrote:
> About point (2) I have questions:
You would probably learn a lot reading "What every programmer should know about
memory?" by U. Drepper [http://people.redhat.com/drepper/cpumemory.pdf]. In
particular Section 3.3.4 explains how CPU
Hi everybody,
I've been following this discussion from the very beginning cause I'm also
trying to learn. The topic seems to be very interesting.
What I've learned so far is that there are extra performance costs
associated with threads:
(1) many libc calls (specifically malloc) in threaded libra
On Sat, Dec 31, 2011 at 2:36 PM, Jorge wrote:
>
> tThreads(seconds): 0.535, tProcs(seconds): 0.573, ratio:0.933
>
> Perhaps I'm doing it wrong ?
> Could you run it on other unixes and post the results ?
>
I used "-O3 -pthread" for CFLAGS and got the following results on two
vastly different Linux
On 28/12/2011, at 18:29, Hongli Lai wrote:
>
> The only thing left that I'm interested in is why Marc thinks
> processes are more efficient than threads. I'm interested in his
> answer because I wish to learn more about how hardware and operating
> systems work.
I run this benchmark again and aga
On Sat, Dec 31, 2011 at 6:34 AM, Hongli Lai wrote:
> The benchmark tool in the Samba presentation also surprised me. It
> showed that processes are indeed a bit faster at various system calls,
> but only slightly so. Still doesn't make much sense to me though
> because the kernel already has to p
Thanks for the reply Marc. :)
I think I get the gist of what you're saying. You're advocating a few
processes (up to the number of CPU cores) with user-space threads to
avoid kernel context switch overhead. I already agree with that.
The benchmark tool in the Samba presentation also surprised me.
On Thu, Dec 22, 2011 at 02:53:52PM +0100, Hongli Lai wrote:
> I know that, but as you can read from my very first email I was planning on
> running I threads, with I=number of cores, where each thread has 1 event
> loop. My question now has got nothing to do with the threads vs events
> debate. Ma
On Thu, Dec 22, 2011 at 08:05:08AM +0100, Hongli Lai wrote:
> >> Are you talking about hardware simultaneous multithreading
> >> (http://en.wikipedia.org/wiki/Simultaneous_multithreading), e.g.
> >> HyperThreading?
> >
> > No, just distant history, try
> > http://en.wikipedia.org/wiki/Thread_%28co
On Wed, Dec 28, 2011 at 9:29 AM, Hongli Lai wrote:
> The only thing left that I'm interested in is why Marc thinks
> processes are more efficient than threads. I'm interested in his
> answer because I wish to learn more about how hardware and operating
> systems work.
Check out www.samba.org/~tri
On Wed, Dec 28, 2011 at 6:08 PM, Chris Brody wrote:
> Hongli maybe this answer can help you. Yes libeio is using globals,
> libeio is USING LOCKS (mutex) and conditionals so AFAIK it would be
> safe to be using libeio from multiple event-loop threads.
>
> BTW libeio uses an interesting strategy to
On Tue, Dec 20, 2011 at 2:34 PM, Hongli Lai wrote:
> I'm writing a multithreaded evented server in which I have N threads
> (N=number of CPU cores) and one libev event loop per thread. I want to
> use libeio but it looks like libeio depends on global variables so
> this isn't going to work. I'd li
On Thu, Dec 22, 2011 at 8:05 AM, Hongli Lai wrote:
> According to that same Wikipedia article, some CPUs have multiple
> register files in order to reduce thread switching time, i.e. what you
> describe as extra support for multiple contexts. According to the
> article, that idea came in the late
On Thu, Dec 22, 2011 at 3:54 PM, Brandon Black wrote:
> Right, so either way an argument based on 2 threads per core is irrelevant,
> which is the argument you made in point (2) earlier. It doesn't make sense
> to argue about the benefits of threads under a layout that's know to be
> suboptimal i
On Thu, Dec 22, 2011 at 7:53 AM, Hongli Lai wrote:
> I know that, but as you can read from my very first email I was planning
> on running I threads, with I=number of cores, where each thread has 1 event
> loop. My question now has got nothing to do with the threads vs events
> debate. Marc is cl
On Thu, Dec 22, 2011 at 2:53 PM, Hongli Lai wrote:
> I know that, but as you can read from my very first email I was planning on
> running I threads, with I=number of cores, where each thread has 1 event
> loop. My question now has got nothing to do with the threads vs events
> debate. Marc is cla
I know that, but as you can read from my very first email I was planning on
running I threads, with I=number of cores, where each thread has 1 event
loop. My question now has got nothing to do with the threads vs events
debate. Marc is claiming that running I *processes* instead of I threads is
fas
On Thu, Dec 22, 2011 at 1:05 AM, Hongli Lai wrote:
> 2. Suppose the system has two cores and N = 4, so two processes or two
> threads will be scheduled on a single core. A context switch to
> another thread on the same core should be cheaper because 1) the MMU
> register does not have to swapped
On 22/12/2011, at 08:05, Hongli Lai wrote:
>
> It's true that the second program adds an extra layer of indirection
> (the 'data' variable). However:
> 1. If the data is accessed frequently then both the pointer and the
> ThreadData that it points to should be cached by the CPU cache, making
> the
On Thu, Dec 22, 2011 at 3:02 AM, Marc Lehmann wrote:
> With threads, you can avoid swapping some registers, most notably the MMU
> registers, which are often very costly to swap, to the extent that a number
> of cpus even have extra support for multiple "contexts".
>
>> Are you talking about hardw
On Wed, Dec 21, 2011 at 09:38:01PM +0100, Hongli Lai wrote:
> > Well, threads were originally invented because single cpus only had a single
> > set of registers, and swapping these can be costly (especially with vm
> > state).
>
> I agree with your assertion that single CPUs had a single set of
On Wed, Dec 21, 2011 at 09:21:15PM +0100, Hongli Lai wrote:
> I know that it's *possible* to make it work with the way libeio is
> right now. What I'm concerned about is that it takes me a huge amount
> of boilerplate code to do so. Consider all these steps:
I am concerned that adding another lay
On Wed, Dec 21, 2011 at 11:43:54PM +0100, Ben Noordhuis
wrote:
> > A tasks witch will typically only change the registers it needs to
> > switch. For example, both MMU and FPU registers are only changed on demand
> > on Linux, on architectures that allow that.
>
> That's better. Your original st
On Wed, Dec 21, 2011 at 20:54, Marc Lehmann wrote:
> Registers don't need magic to stay "intact" (== have the same value), they
> basically never change on their own, instructions must change them. There
> are exceptions, but the point is true even then: registers don't change
> due to magic, they
On Wed, Dec 21, 2011 at 1:46 AM, Marc Lehmann wrote:
> Well, threads were originally invented because single cpus only had a single
> set of registers, and swapping these can be costly (especially with vm
> state).
I agree with your assertion that single CPUs had a single set of
registers and tha
On Wed, Dec 21, 2011 at 1:12 AM, Marc Lehmann wrote:
> libeio actually makes no assumptions about the existance of an event loop,
> or there being only one.
> ...
> well, I pointed out a way to you how to work with multiple event loops, so
> I am not sure why you write that: it is not true.
> ...
On Wed, Dec 21, 2011 at 02:43:14PM +0100, Ben Noordhuis
wrote:
> > Maybe you have simply the wrong idea about what a context switch is or how
> > a cpu or mmu works.
>
> Well put in a nicely condescending tone.
Maybe a less arrogant mail might have gotten a more favourable
response? Just sayin.
On Wed, Dec 21, 2011 at 04:19, Marc Lehmann wrote:
> On Wed, Dec 21, 2011 at 03:37:48AM +0100, Ben Noordhuis
> wrote:
>> You either have an overly broad definition of or simply the wrong idea
>> about what CPU registers are.
>
> Then you can surely point out where my idea differs from, say, inte
On Wed, Dec 21, 2011 at 03:37:48AM +0100, Ben Noordhuis
wrote:
> You either have an overly broad definition of or simply the wrong idea
> about what CPU registers are.
Then you can surely point out where my idea differs from, say, intel's or
motorola's idea of what registers are. Please do so, I
On Wed, Dec 21, 2011 at 01:46, Marc Lehmann wrote:
> On Tue, Dec 20, 2011 at 09:26:28PM +0100, Hongli Lai
> wrote:
>> I would like to know more about this claim. It's not that I don't
>> believe you, but I'm genuinely interested in this area and I hear all
>> kinds of (often contradicting) infor
On Tue, Dec 20, 2011 at 09:26:28PM +0100, Hongli Lai wrote:
> I would like to know more about this claim. It's not that I don't
> believe you, but I'm genuinely interested in this area and I hear all
> kinds of (often contradicting) information about threads vs processes.
> What is it about proces
On Tue, Dec 20, 2011 at 09:26:20PM +0100, Hongli Lai wrote:
> I know, but that's not what I mean. I'm talking about reentrancy.
> Right now the libeio API assumes that there is one event loop. The
libeio actually makes no assumptions about the existance of an event loop,
or there being only one.
On Tue, Dec 20, 2011 at 5:06 PM, Marc Lehmann wrote:
> Threads were meant as an optimisation for single cpu systems though, and
> processes are meant for multiple cpus (or cores), and use the available
> hardware more efficiently.
I would like to know more about this claim. It's not that I don't
On Tue, Dec 20, 2011 at 4:17 PM, Marc Lehmann wrote:
> global variables are entirely fine with threads (libeio itself uses
> threads).
I know, but that's not what I mean. I'm talking about reentrancy.
Right now the libeio API assumes that there is one event loop. The
want_poll callback assumes th
On Tue, Dec 20, 2011 at 03:26:22PM +0100, Hongli Lai wrote:
> event loop per thread. I'm thinking about multiple eio contexts with
> each context having its own thread pool.
This is strictly less useful though, as you lose control over scheduling,
which might not be an issue for you though.
> Th
On Tue, Dec 20, 2011 at 02:34:52PM +0100, Hongli Lai wrote:
> I'm writing a multithreaded evented server in which I have N threads
> (N=number of CPU cores) and one libev event loop per thread. I want to
> use libeio but it looks like libeio depends on global variables so
> this isn't going to wor
On Tue, Dec 20, 2011 at 3:04 PM, Paddy Byers wrote:
> This has been asked for before and rejected.
>
> This is what I'm doing, which works well:
>
> http://lists.schmorp.de/pipermail/libev/2011q4/001584.html
(Replying to libev mailing list so that Marc can see my reasons)
I see. My case is a lit
I'm writing a multithreaded evented server in which I have N threads
(N=number of CPU cores) and one libev event loop per thread. I want to
use libeio but it looks like libeio depends on global variables so
this isn't going to work. I'd like to request the ability to use
libeio with multiple event
45 matches
Mail list logo