>On Tue, Apr 8, 2014 at 11:00 PM, big stone <stonebi...@gmail.com> wrote: >> Hi, >> >> I did experiment splitting my workload in 4 threads on my cpu i3-350m >to >> see what are the scaling possibilities. >> >> Timing : >> 1 cpu = 28 seconds >> 2 cpu = 16 seconds >> 3 cpu = 15 seconds >> 4 cpu = 14 seconds >> > >If the info at http://ark.intel.com/products/43529/Intel-Core-i3-350M- >Processor-3M-Cache-2_26-GHz >is right, you have 2 cores, each having 2 threads. They're logically >"cores", but physically not so. My tests with any multi-threading >benchmarking including parallel quicksort showed that a similar i3 >mobile processor rarely benefit after 2 threads, probably cache >coherence penalty is the cause. Desktop Intel Core i5-2310, for >example, is a different beast (4 cores/4 threads), 3 threads almost >always was x3 times faster, 4 threads - with a little drop. > >It all still depends on the application. Once I stopped believing a >2-threaded Atom would show x2 in any of tests I made, when on one >graphical one it finally made it. But still if number of threads are >bigger than number of cores then it's probably a legacy of >HyperThreading hardware Intel started multi-threading with
It greatly depends on the processor and whether the so-called hyper threads are real threads or half-assed threads. Some Intel processors support real SMP threads in which there is no difference if your code is dispatched on the "main thread" or the "hyper-thread". Other processors use very fake threads in which only a very small percentage of the ALU is available to the "hyper-thread" and only the main thread has access to the entire execution unit. The former is good, the latter usually makes things run slower when multiple threads are running unless you and/or the application are smart enough to ensure that you set the thread affinity so that the thread dispatched to the half-assed thread never needs to access the parts of the execution unit that are never available to that thread. If you do not take such care, then you will continually stall the decoding pipeline and the RISC microcode execution stream as the processor switches threads between the two pipelines. For traditional (aka useless) hyper-threaded processors, you are usually better off to disable hyper-threading in the BIOS and dedicate all the execution unit resources to a single thread. For processors that support SMP hyper-threading you generally get excellent multiprogramming ratio's until all the pipelines and execution units are fully consumed (assuming sufficient L1 and L2 cache that is well designed, and good code and data locality). Often for a decent mix of compute and I/O, this means that you can load up almost full compute on all threads simultaneously and almost fully overlap all I/O waits with useful compute -- just like a real computer. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users