Re: CPU utilization in each superstep

2015-04-05 Thread Matthew Saltz
I don't remember all the info it gives you but check out the giraph metrics command line flag. Best, Matthew On Sunday, April 5, 2015, Ravikant Dindokar ravikant.i...@gmail.com wrote: Hi I am newbie learning Giraph on hadoop 2.2.0. I want to find out CPU utilization as well as time spent

Re: Undirected Vertex Definition and Reflexivity

2015-03-11 Thread Matthew Saltz
to specify the edge file. Best, Matthew Saltz On Wed, Mar 11, 2015 at 1:54 AM, G.W. gwindel...@gmail.com wrote: Thanks for that! This is the right idea, however I was only using a VertexReader until now – IntNullReverseTextEdgeInputFormat calls for an EdgeReader. I am not sure

Re: Undirected Vertex Definition and Reflexivity

2015-03-10 Thread Matthew Saltz
Have a look at IntNullReverseTextEdgeInputFormat https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html. It automatically creates reverse edges, but it expects the file format source_id, target_id on each line. If you need to convert it to use longs

Re: Giraph Partitioning

2015-02-25 Thread Matthew Saltz
Hi, 1) The partitions are processed in parallel based on the number of threads you specify. The vertices within a partition are processed sequentially. You may want to use more partitions than threads, that way if one partition takes a particularly long time to be processed, the other threads can

Re: Best way to know the assignment of vertices to workers

2014-11-28 Thread Matthew Saltz
Kiran, To answer your question directly, in an AbstractComputation class (or whatever descendant you're using), you may call getWorkerContext().getMyWorkerIndex() (here https://giraph.apache.org/apidocs/org/apache/giraph/worker/WorkerContext.html). However, if each vertex has metadata associated

Re: When do Giraph vertices receive their messages?

2014-11-10 Thread Matthew Saltz
in ComputeCallable are pretty enlightening. As far as messaging goes it appears that everything is flushed from the sender before the end of the superstep. Someone else please correct me if I'm wrong about any of these things; I don't want to mislead anyone. Best, Matthew Saltz On Mon, Nov 10

How to ensure that only one worker runs per node

2014-10-30 Thread Matthew Saltz
Hi everyone, Is there a good way (a configuration I'm guessing) to prevent more than one worker from running per node? I saw in this thread https://www.mail-archive.com/user@giraph.apache.org/msg01580.html to use mapred.tasktracker.map.tasks.maximum=1, but that doesn't seem to be working. Thanks

Re: How to ensure that only one worker runs per node

2014-10-30 Thread Matthew Saltz
at the 'map' link on the tasktracker ui to see all the workers plus master. On Thu, Oct 30, 2014 at 7:11 AM, Matthew Saltz sal...@gmail.com wrote: Hi everyone, Is there a good way (a configuration I'm guessing) to prevent more than one worker from running per node? I saw in this thread to use

Re: Resource Allocation Model Of Apache Giraph

2014-10-24 Thread Matthew Saltz
You may set giraph.userPartitionCount=number of workers and giraph.maxPartitionsInMemory=1. Like Avery said though, since parallelism occurs on a partition level (each thread processes a different partition) if you only have one partition per worker you cannot take advantage of multithreading.

Excessive Memory Usage Compared to Graph Size

2014-10-23 Thread Matthew Saltz
Hi everyone, I'm working on a community detection algorithm for giraph and I'm trying to execute the algorithm on the Friendster graph, which has about 65M nodes and about 1.8 billion edges. Running on 16 machines, before doing ANY processing, it's taking about 50G of RAM. That's 800G total for

Multiple sendMessage calls vs. sendMessageToMultipleEdges

2014-10-22 Thread Matthew Saltz
and then sending that? I'm going through the Edge iterable and building an ArrayPrimitiveWritable of ids but it would be nice if I could somehow access the underlying data structure behind the iterable or just wrap the iterable as a writable somehow. Thanks so much for the help, Matthew Saltz

Re: Multiple sendMessage calls vs. sendMessageToMultipleEdges

2014-10-22 Thread Matthew Saltz
, Lukas Nalezenec lukas.naleze...@firma.seznam.cz wrote: Hi Matthew, See class SendMessageToAllCache. Its in the same directory as SendMessageCache. The first class is not used by Giraph unless you set property giraph.oneToAllMsgSending to true. Lukas On 22.10.2014 20:10, Matthew Saltz wrote

Re: Multiple sendMessage calls vs. sendMessageToMultipleEdges

2014-10-22 Thread Matthew Saltz
Actually, one more question: are there any disadvantages to enabling oneToAllMessaging? Is there any reason not to do it by default? Best, Matthew El 22/10/2014 23:28, Matthew Saltz sal...@gmail.com escribió: Lukas, Thank you so much for the help. By 'the first class', you mean

Re: getAggregatedValue calling aggregate

2014-09-23 Thread Matthew Saltz
HI Puneet, What are you trying to do in getAggregatedValue()? Is there any reason you don't just want to return the current value of the aggregator (which is what the default implementation does)? Best, Matthew On Sat, Sep 20, 2014 at 6:25 AM, Puneet Agarwal puagar...@yahoo.com wrote: I have

Re: understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward

2014-09-22 Thread Matthew Saltz
Sorry, should be *org.apache.giraph.utils.MemoryUtils.getRuntimeMemoryStats(), *I left out the giraph. On Mon, Sep 22, 2014 at 8:10 PM, Matthew Saltz sal...@gmail.com wrote: Hi Matthew, I answered a few of your questions in-line (unfortunately they might not help the larger problem

Re: understanding failing my job, Giraph/Hadoop memory usage, under-utilized nodes, and moving forward

2014-09-22 Thread Matthew Saltz
Hi Matthew, I answered a few of your questions in-line (unfortunately they might not help the larger problem, but hopefully it'll help a bit). Best, Matthew On Mon, Sep 22, 2014 at 5:50 PM, Matthew Cornell m...@matthewcornell.org wrote: Hi Folks, I've spent the last two months learning,

Re: Problem processing large graph

2014-09-11 Thread Matthew Saltz
Hi Tripti, How many machines are you running on? The ideal configuration would be one worker per machine and one separate machine for the master. If you're using more mappers than machines then you're using more resources than necessary, and fixing that could help. Best, Matthew El 11/09/2014

Re: How do I validate customArguments?

2014-09-10 Thread Matthew Saltz
of the simpler examples such as degree count to see if /anything/ will run against my graph, which is very small (100K and edges nodes). -- matt On Thu, Aug 28, 2014 at 2:26 PM, Matthew Saltz sal...@gmail.com wrote: Matt, I'm not sure if you've resolved this problem already or not, but if you

Re: How do I validate customArguments?

2014-08-28 Thread Matthew Saltz
Matt, I'm not sure if you've resolved this problem already or not, but if you haven't: The initialize() method isn't limited to registering aggregators, and in fact, in my project I use it to do exactly what you're describing to check and load custom configuration parameters. Inside the

Re: Setting variable value in Compute class and using it in the next superstep

2014-07-21 Thread Matthew Saltz
Tom, If it's necessary to store more than one flag though, for example, won't a custom class be necessary? I'm a beginner too, so I apologize if I'm incorrect about that. Just to be clarify, to keep persistent data for a vertex from one superstep to the next, it is necessary to encapsulate it in

Re: Setting variable value in Compute class and using it in the next superstep

2014-07-21 Thread Matthew Saltz
reader can be quite simple. -- *From:* Matthew Saltz [sal...@gmail.com] *Sent:* Monday, July 21, 2014 3:09 PM *To:* user@giraph.apache.org *Subject:* Re: Setting variable value in Compute class and using it in the next superstep Tom, If it's necessary to store