RE: Needs a simple answer

2010-12-20 Thread Peng, Wei
Maha, If you want to access hdfs file in mapper class, use context.getConfiguration() in FileSystem.get. In the main method, you need to create configuration first using new configuration(). In your case, you cannot create configuration in main method, and access it in mapper class because the mai

RE: Friends of friends with MapReduce

2010-12-20 Thread Peng, Wei
Praveen, I just had a quick solution (might be stupid). In the first job, you can easily create adjacent list plus the reversed friendship from your input file. (you can use some special characters to distinguish these two types of output. E.g. "|"). The input is 1 2 1 3 2 4 2

Re: InputFormat for a big file

2010-12-20 Thread madhu phatak
If I use FileInputFormat it gives instantiation error since FileInputFormat is abstract class. On Sat, Dec 18, 2010 at 3:21 AM, Aman wrote: > > Use FileInputFormat > > > You mapper will look something like this > > public class MyMapper extends Mapper<>{ > int sum=0; > > @Override > public v

Re: InputFormat for a big file

2010-12-20 Thread Harsh J
Use TextInputFormat for Text files. On Mon, Dec 20, 2010 at 2:29 PM, madhu phatak wrote: > If I use FileInputFormat it gives instantiation error since FileInputFormat > is abstract class. > > On Sat, Dec 18, 2010 at 3:21 AM, Aman wrote: > >> >> Use FileInputFormat >> >> >> You mapper will look s

Re: Friends of friends with MapReduce

2010-12-20 Thread Antonio Piccolboni
For an easy solution, use hive. Let's say your record contains userid and friendid and the table is called friends Then you would do select A.userid , B.friendid from friends A join friends B on (A.friendid = B user.id) This is on top of my mind, sorry if some details are off, but I've done it in

Re: Friends of friends with MapReduce

2010-12-20 Thread Ted Dunning
On Mon, Dec 20, 2010 at 9:39 AM, Antonio Piccolboni wrote: > For an easy solution, use hive. Let's say your record contains userid and > friendid and the table is called friends > Then you would do > select A.userid , B.friendid from friends A join friends B on (A.friendid = > B user.id) > > This

RE: Friends of friends with MapReduce

2010-12-20 Thread Ricky Ho
I wrote a blog a while back on this when the social graph is represented as adjacency list and friendship as a mutual (undirected) relationship. http://horicky.blogspot.com/2010/08/mapreduce-to-recommend-people.html Rgds, Ricky

Re: Friends of friends with MapReduce

2010-12-20 Thread Praveen Bathala
Sounds good. i know that we can use Hive, I was talking to a friends saying it may be easy in hive, but I will write in java and I was stuck :-(. probably i should opt for Hive then.. Thank you - Praveen On Mon, Dec 20, 2010 at 12:39 PM, Antonio Piccolboni < anto...@piccolboni.info> wrote: > Fo

Re: Friends of friends with MapReduce

2010-12-20 Thread Praveen Bathala
Nice blog with good stuff...helps me figure this out Thank you all guys - Praveen On Mon, Dec 20, 2010 at 2:15 PM, Ricky Ho wrote: > I wrote a blog a while back on this when the social graph is represented as > adjacency list and friendship as a mutual (undirected) relationship. > http://horick

Job/Task Log timestamp questions

2010-12-20 Thread Raj V
I am running terasort on 10^12 bytes on a 512 node hadoop cluster. There is something funny about the timings, that I am unable to explain. Probably something trivial, but not visible to my naked eye ! Here are the details - I am using CDH3B3 The job started at 14:47 on 12/10/2010 10/12/10 1

breadth-first search

2010-12-20 Thread Peng, Wei
I implemented an algorithm to run hadoop on a 25GB graph data to calculate its average separation length. The input format is V1(tab)V2 (where V2 is the friend of V1). My purpose is to first randomly select some seed nodes, and then for each node, calculate the shortest paths from this node to al

Re: breadth-first search

2010-12-20 Thread Edward J. Yoon
Check this slide out - http://people.apache.org/~edwardyoon/papers/Apache_HAMA_BSP.pdf On Tue, Dec 21, 2010 at 10:49 AM, Peng, Wei wrote: > >  I implemented an algorithm to run hadoop on a 25GB graph data to > calculate its average separation length. > The input format is V1(tab)V2 (where V2 is t

RE: breadth-first search

2010-12-20 Thread Peng, Wei
Yoon, Can I use HAMA now, or it is still in development? Thanks Wei -Original Message- From: Edward J. Yoon [mailto:edwardy...@apache.org] Sent: Monday, December 20, 2010 6:23 PM To: common-user@hadoop.apache.org Subject: Re: breadth-first search Check this slide out - http://people.a

RE: breadth-first search

2010-12-20 Thread Ricky Ho
I also blog about how to to Single Source Shortest Path here at http://horicky.blogspot.com/2010/02/nosql-graphdb.html One MR algorithm is based on Dijkstra and the other is based on BFS. I think the first one is more efficient than the second one. Rgds, Ricky -Original Message- Fro

RE: breadth-first search

2010-12-20 Thread Peng, Wei
Thanks Ricky. I do not think the Dijkstra is more efficient than BFS (costly when looking for the node with minimum distance, and we do not need to do that when there is no edge weight). BFS should be the special version of Dijkstra when the weight of edges are all equal. In your algorithm, you are

Re: breadth-first search

2010-12-20 Thread Ted Dunning
On Mon, Dec 20, 2010 at 8:16 PM, Peng, Wei wrote: > ... My question is really about what is the efficient way for graph > computation, matrix computation, algorithms that need many iterations to > converge (with intermediate results). > Large graph computations usually assume a sparse graph for

RE: breadth-first search

2010-12-20 Thread Peng, Wei
Dunning, Currently, most of the matrix data (graph matrix, document-word matrix) that we are dealing with are sparse. The matrix decomposition often needs many iterations to converge, then intermediate results have to be saved to serve as the input for the next iteration. This is super inefficien

Re: breadth-first search

2010-12-20 Thread Ted Dunning
On Mon, Dec 20, 2010 at 9:43 PM, Peng, Wei wrote: > ... > Currently, most of the matrix data (graph matrix, document-word matrix) > that we are dealing with are sparse. > Good. > The matrix decomposition often needs many iterations to converge, then > intermediate results have to be saved to s

Re: breadth-first search

2010-12-20 Thread Edward J. Yoon
There's no release yet. But, I had tested the BFS using hama and, hbase. Sent from my iPhone On 2010. 12. 21., at 오전 11:30, "Peng, Wei" wrote: > Yoon, > > Can I use HAMA now, or it is still in development? > > Thanks > > Wei > > -Original Message- > From: Edward J. Yoon [mailto:ed