>On Tue, Apr 8, 2014 at 11:00 PM, big stone <stonebi...@gmail.com> wrote:
>> Hi,
>>
>> I did experiment splitting my workload in 4 threads on my cpu i3-350m
>to
>> see what are the scaling possibilities.
>>
>> Timing :
>> 1 cpu = 28 seconds
>> 2 cpu = 16 seconds
>> 3 cpu = 15 seconds
>> 4 cpu = 14 seconds
>>
>
>If the info at http://ark.intel.com/products/43529/Intel-Core-i3-350M-
>Processor-3M-Cache-2_26-GHz
>is right, you have 2 cores, each having 2 threads. They're logically
>"cores", but physically not so. My tests with any multi-threading
>benchmarking including parallel quicksort showed that a similar i3
>mobile processor rarely benefit after 2 threads, probably cache
>coherence penalty is the cause. Desktop Intel Core i5-2310, for
>example, is a different beast (4 cores/4 threads), 3 threads almost
>always was x3 times faster, 4 threads - with a little drop.
>
>It all still depends on the application. Once I stopped believing a
>2-threaded Atom would show x2 in any of tests I made, when on one
>graphical one it finally made it. But still if number of threads are
>bigger than number of cores then it's probably a legacy of
>HyperThreading hardware Intel started multi-threading with

It greatly depends on the processor and whether the so-called hyper threads are 
real threads or half-assed threads.  Some Intel processors support real SMP 
threads in which there is no difference if your code is dispatched on the "main 
thread" or the "hyper-thread".  Other processors use very fake threads in which 
only a very small percentage of the ALU is available to the "hyper-thread" and 
only the main thread has access to the entire execution unit.  The former is 
good, the latter usually makes things run slower when multiple threads are 
running unless you and/or the application are smart enough to ensure that you 
set the thread affinity so that the thread dispatched to the half-assed thread 
never needs to access the parts of the execution unit that are never available 
to that thread.  If you do not take such care, then you will continually stall 
the decoding pipeline and the RISC microcode execution stream as the processor 
switches threads between the two pipelines.

For traditional (aka useless) hyper-threaded processors, you are usually better 
off to disable hyper-threading in the BIOS and dedicate all the execution unit 
resources to a single thread.  For processors that support SMP hyper-threading 
you generally get excellent multiprogramming ratio's until all the pipelines 
and execution units are fully consumed (assuming sufficient L1 and L2 cache 
that is well designed, and good code and data locality).  Often for a decent 
mix of compute and I/O, this means that you can load up almost full compute on 
all threads simultaneously and almost fully overlap all I/O waits with useful 
compute -- just like a real computer.




_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to