I don't remember all the info it gives you but check out the giraph metrics
command line flag.
Best,
Matthew
On Sunday, April 5, 2015, Ravikant Dindokar ravikant.i...@gmail.com wrote:
Hi
I am newbie learning Giraph on hadoop 2.2.0. I want to find out CPU
utilization as well as time spent
to specify the edge file.
Best,
Matthew Saltz
On Wed, Mar 11, 2015 at 1:54 AM, G.W. gwindel...@gmail.com wrote:
Thanks for that!
This is the right idea, however I was only using a VertexReader until now
– IntNullReverseTextEdgeInputFormat calls for an EdgeReader.
I am not sure
Have a look at IntNullReverseTextEdgeInputFormat
https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html.
It automatically creates reverse edges, but it expects the file format
source_id, target_id
on each line. If you need to convert it to use longs
Hi,
1) The partitions are processed in parallel based on the number of threads
you specify. The vertices within a partition are processed sequentially.
You may want to use more partitions than threads, that way if one partition
takes a particularly long time to be processed, the other threads can
Kiran,
To answer your question directly, in an AbstractComputation class (or
whatever descendant you're using), you may call
getWorkerContext().getMyWorkerIndex() (here
https://giraph.apache.org/apidocs/org/apache/giraph/worker/WorkerContext.html).
However, if each vertex has metadata associated
in ComputeCallable are pretty
enlightening. As far as messaging goes it appears that everything is
flushed from the sender before the end of the superstep.
Someone else please correct me if I'm wrong about any of these things; I
don't want to mislead anyone.
Best,
Matthew Saltz
On Mon, Nov 10
Hi everyone,
Is there a good way (a configuration I'm guessing) to prevent more than one
worker from running per node? I saw in this thread
https://www.mail-archive.com/user@giraph.apache.org/msg01580.html to use
mapred.tasktracker.map.tasks.maximum=1, but that doesn't seem to be
working. Thanks
at the 'map' link on the
tasktracker ui to see all the workers plus master.
On Thu, Oct 30, 2014 at 7:11 AM, Matthew Saltz sal...@gmail.com wrote:
Hi everyone,
Is there a good way (a configuration I'm guessing) to prevent more than
one
worker from running per node? I saw in this thread to use
You may set giraph.userPartitionCount=number of workers and
giraph.maxPartitionsInMemory=1.
Like Avery said though, since parallelism occurs on a partition level (each
thread processes a different partition) if you only have one partition per
worker you cannot take advantage of multithreading.
Hi everyone,
I'm working on a community detection algorithm for giraph and I'm trying to
execute the algorithm on the Friendster graph, which has about 65M nodes
and about 1.8 billion edges. Running on 16 machines, before doing ANY
processing, it's taking about 50G of RAM. That's 800G total for
and
then sending that? I'm going through the Edge iterable and building an
ArrayPrimitiveWritable of ids but it would be nice if I could somehow
access the underlying data structure behind the iterable or just wrap the
iterable as a writable somehow.
Thanks so much for the help,
Matthew Saltz
, Lukas Nalezenec
lukas.naleze...@firma.seznam.cz wrote:
Hi Matthew,
See class SendMessageToAllCache. Its in the same directory as
SendMessageCache. The first class is not used by Giraph unless you set
property giraph.oneToAllMsgSending to true.
Lukas
On 22.10.2014 20:10, Matthew Saltz wrote
Actually, one more question: are there any disadvantages to enabling
oneToAllMessaging? Is there any reason not to do it by default?
Best,
Matthew
El 22/10/2014 23:28, Matthew Saltz sal...@gmail.com escribió:
Lukas,
Thank you so much for the help. By 'the first class', you mean
HI Puneet,
What are you trying to do in getAggregatedValue()? Is there any reason you
don't just want to return the current value of the aggregator (which is
what the default implementation does)?
Best,
Matthew
On Sat, Sep 20, 2014 at 6:25 AM, Puneet Agarwal puagar...@yahoo.com wrote:
I have
Sorry, should be
*org.apache.giraph.utils.MemoryUtils.getRuntimeMemoryStats(),
*I left out the giraph.
On Mon, Sep 22, 2014 at 8:10 PM, Matthew Saltz sal...@gmail.com wrote:
Hi Matthew,
I answered a few of your questions in-line (unfortunately they might not
help the larger problem
Hi Matthew,
I answered a few of your questions in-line (unfortunately they might not
help the larger problem, but hopefully it'll help a bit).
Best,
Matthew
On Mon, Sep 22, 2014 at 5:50 PM, Matthew Cornell m...@matthewcornell.org
wrote:
Hi Folks,
I've spent the last two months learning,
Hi Tripti,
How many machines are you running on? The ideal configuration would be one
worker per machine and one separate machine for the master. If you're using
more mappers than machines then you're using more resources than necessary,
and fixing that could help.
Best,
Matthew
El 11/09/2014
of the simpler examples such as degree count to see
if /anything/ will run against my graph, which is very small (100K and
edges nodes). -- matt
On Thu, Aug 28, 2014 at 2:26 PM, Matthew Saltz sal...@gmail.com wrote:
Matt,
I'm not sure if you've resolved this problem already or not, but if you
Matt,
I'm not sure if you've resolved this problem already or not, but if you
haven't: The initialize() method isn't limited to registering aggregators,
and in fact, in my project I use it to do exactly what you're describing to
check and load custom configuration parameters. Inside the
Tom,
If it's necessary to store more than one flag though, for example, won't a
custom class be necessary? I'm a beginner too, so I apologize if I'm
incorrect about that. Just to be clarify, to keep persistent data for a
vertex from one superstep to the next, it is necessary to encapsulate it in
reader can be quite simple.
--
*From:* Matthew Saltz [sal...@gmail.com]
*Sent:* Monday, July 21, 2014 3:09 PM
*To:* user@giraph.apache.org
*Subject:* Re: Setting variable value in Compute class and using it in
the next superstep
Tom,
If it's necessary to store
21 matches
Mail list logo