I tried the setup with one multithreaded worker per machine for the first time a few minutes ago on a cluster of 25 machines, and my job (closeness centrality estimation on a billion edge graph) ran twice as fast!


On 02/07/2014 12:21 PM, Claudio Martella wrote:
Yes, I think this is the best setup if you have control over your cluster.
And yes, I have already tried that.


On Fri, Feb 7, 2014 at 11:39 AM, Sundara Raghavan Sankaran <
sun...@crayondata.com> wrote:


On Fri, Feb 7, 2014 at 4:00 PM, Claudio Martella <
claudio.marte...@gmail.com> wrote:




On Fri, Feb 7, 2014 at 9:44 AM, Alexander Frolov <
alexndr.fro...@gmail.com> wrote:

  Thank you, I will try to do this. As I understood I should set number
of threads manually through Giraph API.

BTW, what is conceptual difference between running multiple workers on
the TaskTracker and running single worker and multiple threads? In terms of
vertex fetching, memory sharing etc.


Basically, better usage of resources: one single JVM, no duplication of
core data structures, less netty threads and communication points, more
locality (less messages over the network), less actors accessing zookeeper
etc.


So, is it better to have one worker per machine with the number of threads
as per the core of the machines? Suppose if I have 8 machines with 6 cores
each, then instead of running 47 Workers (1 thread per Worker) + 1 Master,
it's better to run 8 Workers (6 threads per Worker) + 1 Master? Have you
tried this already?




  Also I would like to ask how message transfer between vertices is
implemented in terms of Hadoop primitives? Source code reference will be
enough.


Communication does not happen via Hadoop primitives, but ad-hoc via
netty.



--
    Claudio Martella



--
*Sundara Raghavan Sankaran*

  ------------------------------

<http://crayondata.com/?utm_source=emailsig>      
<https://www.facebook.com/crayondata><https://twitter.com/CrayonBigData><http://www.linkedin.com/company/crayon-data><https://plus.google.com/+Crayondata1><http://www.youtube.com/user/crayonbigdata>
www.crayondata.com <http://crayondata.com/?utm_source=emailsig>

<http://bigdata-madesimple.com/?utm_source=emailsig>
www.bigdata-madesimple.com<http://bigdata-madesimple.com/?utm_source=emailsig>
------------------------------

  
Finalist<http://www.code-n.org/fileadmin/user_upload/pdf/131210_List_Top_50_EN.pdf>
 at
the Code_N 2014 Contest <http://www.code-n.org/cebit/award/> at 
CEBIT<http://www.cebit.com/>,
Hanover - the only big data company from Asia.


This email and its contents are confidential, and meant only for you.
Views or opinions, presented in this email, are solely of the author and
may not necessarily represent Crayon Data.





Reply via email to