Thanks for pushing this though Roman. Looks great!
On 11/18/14, 4:30 AM, Roman Shaposhnik wrote:
Hi!
with 3 binding +1, one non-binding +1,
no 0s or -1s the vote to publish
Apache Giraph 1.1.0 RC2 as the 1.1.0 release of
Apache Giraph passes. Thanks to everybody who
spent time on validating
Theoretically, Giraph on YARN would be much better (actual resource
request rather than mapper hack). That being said, Eli is the best
person to talk about that. We haven't tried YARN.
Avery
On 10/6/14, 8:51 AM, Matthew Cornell wrote:
Hi Folks. I don't think I paid enough attention to YARN
Take a look at the interfaces for MasterGraphPartitioner and
WorkerGraphPartitioner and their implementations for hash parititoning
(HashRangePartitionerFactory). You can implement any kind of
partitioning you like.
Avery
On 8/8/14, 7:51 AM, Robert McCune wrote:
For a non-hash partitioning,
I'm seen this work demoed. It's awesome, especially for applications
that are not very predictable.
Avery
On 6/4/14, 11:00 AM, Semih Salihoglu wrote:
Hi Giraph Users,
I wanted to introduce to you Graft, a project that some of us at
Stanford have built over the last quarter. If you are a
Giraph should just pick up your cluster's HDFS configuration. Can you
check your hadoop *.xml files?
On 6/1/14, 3:34 AM, John Yost wrote:
Hi Everyone,
Not sure why, but Giraph tries to connect to port 9000:
java.net.ConnectException: Call From localhost.localdomain/127.0.0.1
You might also want to check the zookeeper memory options.
Some of our production jobs use parameters such as
-Xmx5g -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:MaxGCPauseMillis=100
Since the master doesn't use much memory letting zk have more is
*giraph.zkJavaOpts*
On 5/27/14, 10:27 AM, Praveen kumar s.k wrote:
Do need to put this in the zookeeper configuration file or giraph job
configuration?
On Tue, May 27, 2014 at 12:14 PM, Avery Chingach...@apache.org wrote:
You might also want to check the zookeeper memory options.
Some of
I think this is the key message.
0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB,
average 11.56MB
Having less than 1 MB free won't work. Your workers are likely OOM,
killing the job. Can you get more memory for your job?
On 5/14/14, 3:13 AM, Arun Kumar wrote:
Hi
You can schedule a GIraph job with any MapReduce job scheduler (it is
just a map-only job).
On 4/26/14, 4:30 AM, yeshwanth kumar wrote:
hi i am looking for Giraph job Scheduler just like oozie.
can we schedule a Giraph job using oozie
-yeshwanth.
Maja has been working on Giraph for over a year and is one of our
biggest contributors. Adding her to the Giraph PMC in recognition of
her impressive work is long overdue.
Some of her major contributions include composable computation, sharded
aggregators, Hive I/O, support for massive
Hi Giraphers,
Recently, a few internal Giraph users at Facebook published a really
cool blog post on how we partition huge graphs (1.15 billion people and
150 billion friendships - 300B directed edges).
The Project Management Committee (PMC) for Apache Giraphhas asked Pavan
Kumar to become a committer and we are pleased to announce that he
hasaccepted. Here are some of Pavan's contributions:
GIRAPH-858: tests fail for hadoop_facebook because of dependency issues
(pavanka via aching)
Yes, this is one of the great things about Giraph (not many other graph
computation frameworks allow graph mutation). See the Computation class
(i.e.)
/**
* Sends a request to create a vertex that will be available during the
* next superstep.
*
* @param id Vertex id
* @param
They should all be implemented. =)
On 4/16/14, 9:32 PM, Akshay Trivedi wrote:
Does removeVertexRequest(I vertexId) have to be implemented? Is there
any pre-defined class for this?
On Wed, Apr 16, 2014 at 8:33 PM, Avery Ching ach...@apache.org wrote:
Yes, this is one of the great things about
Corona -
https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
On 4/11/14, 8:14 AM, chadi jaber wrote:
Hi avery
What do you mean by your version of hadoop ?
Best regards,
Chadi
Date: Fri, 11 Apr 2014
Hi Vikesh,
You just need to write an input format or use an existing one. You can
specify any number and combination of VertexInputFormat and
EdgeInputFormat formats as per your needs.
Please see giraph-core/src/main/java/org/apache/giraph/io/formats for
some examples.
Avery
On 4/7/14,
Pretty much. But when you remove the vertex, you won't be able to dump
its output (not that all applications need to).
Avery
On 4/7/14, 9:38 AM, Liannet Reyes wrote:
Hi,
Because of my algorithm I am able to detect when a vertex won't be
used anymore, what would be more accurate :
My guess is that you don't get your resources. It would be very helpful
to print the master log. You can find it when the job is running to
look at the Hadoop counters on the job UI page.
Avery
On 4/3/14, 12:49 PM, Vikesh Khanna wrote:
Hi,
I am running the PageRank benchmark under
member of the Apache Giraph project
at the moment, so my question goes to Avery:
Would it be possible for me to become a mentor for Gianluca's project?
Best wishes
Mirko
On Fri, Mar 14, 2014 at 10:19 PM, Avery Ching ach...@apache.org
mailto:ach...@apache.org wrote:
This is a great idea
Hi Young,
Our Hadoop instance (Corona) kills processes after they finish executing
so we don't see this. You might want to do a jstack to see where it's
hung up on and figure out the issue.
Thanks
Avery
On 3/17/14, 7:56 AM, Young Han wrote:
Hi all,
With Giraph 1.0.0, I've noticed an
This is a great idea. Unfortunately, I'm a little bandwidth limited,
but I hope someone can help mentor you!
On 3/14/14, 1:26 PM, Gianluca Righetto wrote:
Hello everyone,
I've been working with Giraph for some time now and I'd like to make some
contributions back to the project through
This looks more like the Zookeeper/YARN issues mentioned in the past.
Unfortunately, I do not have a YARN instance to test this with. Does
anyone else have any insights here?
On 1/10/14 1:48 PM, Kristen Hardwick wrote:
Hi all, I'm requesting help again! I'm trying to get this
The port logic is a bit complex, but all encapsulated in
NettyServer.java (see below).
If nothing else is running on those ports and you really only have one
giraph worker per port you should be good to go. Can you look at the
logs for the worker that is trying to start a port other than
Hi Wei,
For best performance, please be sure to tune the GC settings, use
Java 7, tune the number of cores used for computation,
communication, etc. and the combiner.
We also have some numbers on our recent Facebook blog post.
We did have this error a few times. This can happen due to GC pauses, so
I would check the worker for long GC issues. Also, you can increase the
ZooKeeper timeouts, see
/** ZooKeeper session millisecond timeout */
IntConfOption ZOOKEEPER_SESSION_TIMEOUT =
new
I think you may have added the same vertex 2x? That being said, I don't
see why the code is this way. It should be fine. We should file a JIRA.
On 9/26/13 11:02 AM, Yingyi Bu wrote:
Thanks, Lukas!
I think the reason of this exception is that I run the job over part
of the graph where some
).
Yingyi
On Thu, Sep 26, 2013 at 12:05 PM, Avery Ching ach...@apache.org
mailto:ach...@apache.org wrote:
I think you may have added the same vertex 2x? That being said, I
don't see why the code is this way. It should be fine. We should
file a JIRA.
On 9/26/13 11:02 AM
If you are running out of counters, you can turn off the superstep counters
/** Use superstep counters? (boolean) */
BooleanConfOption USE_SUPERSTEP_COUNTERS =
new BooleanConfOption(giraph.useSuperstepCounters, true,
Use superstep counters? (boolean));
On 9/9/13 6:43 AM,
but failed when they had
900 000 vertexes in memory.
Btw: Why default number of partitions is W^2 ?
(I can be wrong)
Lukas
On 08/31/13 01:54, Avery Ching wrote:
Ah, the new caches. =) These make things a lot faster (bulk
data sending), but do take up some additional
That error is from the master dying (likely due to the results of
another worker dying). Can you do a rough calculation of the size of
data that you expect to be loaded and check if the memory is enough?
On 8/30/13 11:19 AM, Yasser Altowim wrote:
Guys,
Can someone please help me
?
On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching ach...@apache.org
mailto:ach...@apache.org wrote:
Try dumping a histogram of memory usage from a running JVM and see
where the memory is going. I can't think of anything in
particular that changed...
On 8/28/13 4:39 PM, Jeff Peters
Try dumping a histogram of memory usage from a running JVM and see where
the memory is going. I can't think of anything in particular that
changed...
On 8/28/13 4:39 PM, Jeff Peters wrote:
I am tasked with updating our ancient (circa 7/10/2012) Giraph to
giraph-release-1.0.0-RC3. Most jobs
That makes sense, since the Context doesn't have a real InputSplit (it's
a Giraph one - see BspInputSplit).
What information are you trying to get out of the input splits? Giraph
workers can process an arbitrary number of input splits (0 or more), so
I don't think this will be useful.
You
Yes, you can control this behavior with the VertexResolver. It handles
all mutations to the graph and resolves them in a user defined way.
Avery
On 8/19/13 9:21 AM, Marco Aurelio Barbosa Fagnani Lotz wrote:
Hello all :)
I am programming an application that has to create and destroy a few
This is doable in Giraph, you can use as many vertex or edge input
formats as you like (via GIRAPH-639). You just need to choose
MultiVertexInputFormat and/or MultiEdgeInputFromat
See VertexInputFormatDescription for vertex input formats
/**
* VertexInputFormats description - JSON array
Hi Giraphers,
We recently released an article on we can use Giraph at the scale of a
trillion edges at Facebook. If you're interested, please take a look!
https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920
Avery
The Giraph/Pregel model is based on bulk synchronous parallel computing,
where the programmer is abstracted from the details of how the
parallelization occurs (infrastructure does this for you). Additionally
the APIs are built for graph-processing. Since the computing model is
well defined
This should be fixed now.
On 7/20/13 12:20 PM, Avery Ching wrote:
My bad. I am out but will fix in a few hours.
On Jul 20, 2013 11:02 AM, Christian Krause m...@ckrause.org
mailto:m...@ckrause.org wrote:
Hi,
I get these compile errors. Could it be that some classes are missing
I don't think it will be hard to implement. Just start with the
HbaseVertexInputFormat and have it extend EdgeInputFormat. You can look
at TableEdgeInputFormat for an example. It sounds like a good
contribution to Giraph.
On 7/18/13 1:57 PM, Puneet Jain wrote:
I also need this feature.
Not that I know of. Since it is similar to JSON, you might want to take
a look at JsonBase64VertexInputFormat as an example for Avro. Should be
fairly similar in structure. Of course, it would be great if you can
contribute it back to Giraph when you're done. =)
Avery
On 7/18/13 4:36 PM,
Looks like the serialization/descrialization has a problem. If you want
to see an example of a Trove primitive map, see
LongDoubleArrayEdges.
On 7/4/13 7:06 AM, Pasupathy Mahalingam wrote:
Hi,
Thanks Avery Ching.
I get the following exception
java.lang.IllegalStateException: run: Caught
You can easily add bi-directional edges. When you load the edge, simply
also load the reciprocal edge. I.e. if you add a-b, also add b-a.
On 7/2/13 1:11 AM, Pascal Jäger wrote:
Hi everyone,
I am currently getting my hands on giraph which is why I am trying to
implement a maximum flow
Eli, any thoughts?
On 7/3/13 9:27 AM, Chui-Hui Chiu wrote:
Hello,
I tried to compile the Giraph-1.1.0-SNAPSHOT for hadoop_2.0.3 or
hadoop_yarn but all failed.
The error message while the compile command is mvn -Phadoop_yarn
compile is
=
[INFO]
Take a look at PageRankBenchmark, it is a stand alone java program that
runs Giraph jobs.
On 7/2/13 4:08 AM, Ahmet Emre Aladağ wrote:
By the way, I have set the corresponding classes in the giraph
configuration.
GiraphConfiguration giraphConf = new GiraphConfiguration(config);
Claudio, any thoughts?
On 7/3/13 3:52 AM, Han JU wrote:
Hi,
I've been testing some algorithm using the out-of-core feature, and I
have an strange ArrayIndexOutOfBoundsException.
In my computation class, the vertex value is a custom writable class
which contains a long[]. And during the
Zookeeper is required. That being said, you can have an external
Zookeeper or Giraph can start one for you. It's your choice.
Eli is the one to contact regarding Giraph on Hadoop 2.0.5. Any
thoughts Eli?
Avery
On 6/24/13 5:22 PM, Chuan Lei wrote:
It is not clear to me that whether
Rather than use voteToHalt, you could add an Aggregator that kept track
of the alive vertices and then you can use an Aggregator to store/set
your configuration value that the Master computation can modify. Do the
logic in the Master computation and all should be well.
Avery
On 6/3/13 10:04
Improving our documentation is always very nice. Thanks for doing this
you two!
On 5/31/13 7:32 PM, Yazan Boshmaf wrote:
Maria, I can help you with this if you are interested and have the
time. If you are busy, please let me know and I will update the site
docs with a variant of your
Best way is to add it to the vertex value. The vertex value is meant to
store any data associated with a particular vertex.
Hope that helps,
Avery
On 5/7/13 7:47 AM, Ahmet Emre Aladağ wrote:
Hi,
1) What's the best way for storing extra data (such as URL) on a
vertex? I thought this would
--- On *Mon, 5/6/13, Kiru Pakkirisamy
/kirupakkiris...@yahoo.com/*wrote:
From: Kiru Pakkirisamy kirupakkiris...@yahoo.com
Subject: Re: Compiling 1.0.0 distribution
To: user@giraph.apache.org, Avery Ching ach...@apache.org
Date: Monday, May 6, 2013, 12:02 AM
5, 2013 at 10:44 PM, Avery Ching ach...@apache.org
/mc/compose?to=ach...@apache.org wrote:
The easiest way is to compile from the base directory, which
will build everything.
You can build individual directories, but you have to install
the core jars first (i.e
Fellow Giraphers,
We have a our first release candidate since graduating from incubation.
This is a source release, primarily due to the different versions of
Hadoop we support with munge (similar to the 0.1 release). Since 0.1,
we've made A TON of progress on overall performance,
Hi Yuanyuan,
We haven't tested this feature in a while. But it should work. What did
the job report about why it failed?
Avery
On 3/18/13 10:22 AM, Yuanyuan Tian wrote:
Can anyone help me answer the question?
Yuanyuan
From: Yuanyuan Tian/Almaden/IBM@IBMUS
To: user@giraph.apache.org
Congrats Eli!
On 3/15/13 9:03 PM, Eli Reisman wrote:
Thanks! I look forward to many more enjoyable toils in the future!
Send the decoder ring. I'm already wearing the robe ;)
On Fri, Mar 15, 2013 at 2:07 PM, Alessandro Presta alessan...@fb.com
mailto:alessan...@fb.com wrote:
Well
I think those are info level logs rather than actual issues. If your
job completes successfully, I wouldn't worry about it.
On 3/8/13 12:31 PM, Ameet Kini wrote:
Hi folks,
I am trying to run the SimpleShortestPathsVertexTest example
introduced by the unit testing tool as part of
Yeah, this is where things get a bit tricky. You'll have to experiment
with what works for you, but we are using Hive to launch the job with
the jar.sh script. This gets the environment straight from the Hive side.
jar_help () {
echo Used for applications that require Hadoop and Hive
This looks like 0.1 (still using Hadoop RPC). Please try trunk instead.
Avery
On 1/10/13 1:09 AM, pankaj Gulhane wrote:
Hi,
My code is working on smaller (very very small) dataset but if I use
the same code on the large dataset it fails.
Following code is some basic implementation of
We are running several Giraph applications in production using our
version of Hadoop (Corona) at Facebook. The part you have to be careful
about is ensuring you have enough resources for your job to run. But
otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many
more edges).
November 2012 20:57, Avery Ching ach...@apache.org
mailto:ach...@apache.org wrote:
Hi Alexandros,
The extra task is for the master process (a coordination
task). In your case, since you are using a single machine, you
can use a single task
Hi Alexandros,
The extra task is for the master process (a coordination task). In your
case, since you are using a single machine, you can use a single task.
-Dgiraph.SplitMasterWorker=false
and you can try multithreading instead of multiple workers.
-Dgiraph.numComputeThreads=12
The
The connect exception is fine, it usually takes more than one connect
attempt to zk. The reason your job failed is due to not having enough
simultaneous map tasks on your Hadoop instance.
See http://svn.apache.org/repos/asf/giraph/trunk/README for details on
running in pseudo-distributed
Answers inline.
On 10/5/12 1:58 AM, Gergely Svigruha wrote:
Hi,
I have a few questions regarding Giraph.
1) Is is possible to use Giraph for local traversals in the graph? For
example if I want to do some computing on the neighbours of the node
with id xy is it possible to get the reference
I don't think the types are compatible.
public class SimpleTriangleClosingVertex extends EdgeListVertex
IntWritable, SimpleTriangleClosingVertex.IntArrayListWritable,
NullWritable, IntWritable
You'll need to use an input format and output format that fits these
types. Otherwise the issue
(majakabiljo via aching).
In the spirit of your first commit, Maja, please take a look at
https://issues.apache.org/jira/browse/GIRAPH-335 .
Welcome Maja and happy Giraphing!
Avery Ching
:35)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
On Tue, Sep 11, 2012 at 7:53 AM, Avery Ching ach
These days we are focusing more on the netty IPC. Can you try
-Dgiraph.useNetty=true?
Avery
On 9/10/12 2:08 PM, Franco Maria Nardini wrote:
Dear all,
I am working with Giraph 0.2/Hadoop 1.0.3. In particular, I am trying
to execute the following code:
hadoop jar
PageRankBenchmark doesn't have an use an output format. If you'd like
to see the output, just add a VertexOutputFormat (that matches the
types). You could start with JsonBase64VertexOutputFormat.
i.e. in PageRankBenchmark.java add
job.setVertexOutputFormatClass(
I tried adding the from emails to the d...@giraph.apache.org mailing
list. Shouldn't that work?
On 7/16/12 12:17 PM, Jakob Homan wrote:
I don't believe so. The from list seems reasonable on each one:
-- Forwarded message --
From: Avery Ching avery.ch...@gmail.com
To: Avery
You should try using the appropriate memory settings (i.e.
-Dmapred.child.java.opts=-Xms30g -Xmx30g -Xss128k) for a 30 GB heap.
This depends on how much memory you can get.
Avery
On 7/9/12 5:57 AM, Amani Alonazi wrote:
Actually, I had the same problem of running out of memory with Giraph
Status report for the Apache Giraph project - July 2012
Giraph is a Bulk Synchronous Parallel framework for writing programs that
analyze large graphs on a Hadoop cluster. Giraph is similar to Google's
Pregel system.
Project Status
--
Releases:
0.2.0 - expected 7/31
* Reduce
If you're running without a real Hadoop instance, you'll need to blow
away the zk directories after running the first time. Hope that helps,
Avery
On 6/19/12 5:39 PM, Jonathan Bishop wrote:
Hi,
I am exploring Giraph 0.1 and was able to download, build, and run all
the tests - all 58
We did have a related issue
(https://issues.apache.org/jira/browse/GIRAPH-155).
On 5/29/12 6:54 AM, Claudio Martella wrote:
I'm not sure they will be needed to send them on the first superstep.
They'll be created and used in the second superstep if necessary. If
they need it in the first
Did you compile with the appropriate flags?
From the README:
- Apache Hadoop 0.23.1
You may tell maven to use this version with mvn -Phadoop_0.23 goals.
On 5/25/12 9:24 AM, Roman Shaposhnik wrote:
Hi!
I'm trying to run Giraph trunk on top of Hadoop 2.0.0 and I'm getting
the following
73 matches
Mail list logo