hadoop on fedora 15

2011-08-05 Thread Manish
Hi, Has anybody been able to run hadoop standalone mode on fedora 15 ? I have installed it correctly. It runs till map but gets stuck in reduce. It fails with the error mapred.JobClient Status : FAILED Too many fetch-failures. I read several articles on net for this problem, all of them say about

Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-05 Thread John Armstrong
On Fri, 5 Aug 2011 08:50:02 +0800 (CST), Daniel,Wu hadoop...@163.com wrote: The book also mentioned the value if mutable, I think the key might also be mutable, means as we loop each value in iterableNullWritable, the content of the key object is reset. The mutability of the value is one of

Re: hadoop on fedora 15

2011-08-05 Thread madhu phatak
disable iptables and try again On Fri, Aug 5, 2011 at 2:20 PM, Manish manish.iitg...@gmail.com wrote: Hi, Has anybody been able to run hadoop standalone mode on fedora 15 ? I have installed it correctly. It runs till map but gets stuck in reduce. It fails with the error mapred.JobClient

Re: hadoop on fedora 15

2011-08-05 Thread Harsh J
Sun JDK is what its been thoroughly tested upon. You can run on OpenJDK perhaps, but YMMV. Hadoop has a strict requirement of having a proper network setup before use. What port range did you open? TaskTracker would use 50060 for intercommunication (over lo, if its bound to that). Check if your

Re: Upload, then decompress archive on HDFS?

2011-08-05 Thread Harsh J
I suppose we could do with a simple identity mapping/identity reducing example/tool that can easily be reutilized for purposes such as these. Could you file a JIRA on this? The -text is like -cat but has codec and some file format detection. Hopefully it should work for your case. On Fri, Aug 5,

streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
I can use cacheFile to load .so files into the distributed cache and it works fine (the streaming executable links against the .so and runs), but I can't get it to work with -cacheArchive. It always says it can't find the .so file. I realize that if you jar a directory, the directory will be

Re: streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
Quick followup. I substituted the true mapper for a little python script that just lists the cwd's contents and dumps them to the streaming output (stderr). Oddly, I it doesn't look like the .jar far was unpacked. I can see it there, but not the unpacked version, so it looks like

Re: streaming cacheArchive shared libraries

2011-08-05 Thread Ramya Sunil
Hi Keith, I have tried the exact use case you have mentioned and it works fine for me. Below is the command line for the same: [ramya]$ jar vxf samplelib.jar created: META-INF/ inflated: META-INF/MANIFEST.MF inflated: libhdfs.so [ramya]$ hadoop dfs -put samplelib.jar samplelib.jar [ramya]$

Re: streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
Okay, I think I understand. The symlink name that follows the pound sign in the -cacheArchive directive isn't the name of the transferred jar file -- it is the name of a directory that the .jar file will be put into and then be unjarred. So, it doesn't act like like jar would on a local

Re: streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
Right, so it was pushed down a level into the testlink directory. That's why my shared libraries were not linking properly to my mapper executable. I can fix that by using cmddev to redirect LD_LIBRARY_PATH. I think that'll work. On Aug 5, 2011, at 10:44 , Ramya Sunil wrote: Hi Keith, I

Order of Operations

2011-08-05 Thread Premal Shah
Hi, According to the attached image found on yahoo's hadoop tutorialhttp://developer.yahoo.com/hadoop/tutorial/module4.html, the order of operations is map combine partition which should be followed by reduce Here is my an example key emmited by the map operation

cmdenv LD_LIBRARY_PATH

2011-08-05 Thread Keith Wiley
I know you can do something like this: -cmdenv LD_LIBRARY_PATH=./my_libs if you have shared libraries in a subdirectory under the cwd (such as occurs when using -cacheArchive to load and unpack a jar full of .so files into the distributed cache)...but this destroys the existing path. I think

Hadoop order of operations

2011-08-05 Thread Premal
According to the attached image found on yahoo's hadoop tutorial, the order of operations is map combine partition which should be followed by reduce Here is my an example key emmited by the map operation LongValueSum:geo_US|1311722400|E1 Assuming there are 100 keys of the same

Re: maprd vs mapreduce api

2011-08-05 Thread Stevens, Keith D.
The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the identity function. So you should be able to just do conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class); conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class); without having to implement your own

Re: maprd vs mapreduce api

2011-08-05 Thread Mohit Anchlia
On Fri, Aug 5, 2011 at 3:42 PM, Stevens, Keith D. steven...@llnl.gov wrote: The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the identity function.  So you should be able to just do conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);

java.io.IOException: config()

2011-08-05 Thread jagaran das
Hi, I have been struck with this exception: java.io.IOException: config() at org.apache.hadoop.conf.Configuration.(Configuration.java:211) at org.apache.hadoop.conf.Configuration.(Configuration.java:198) at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:99) at

Re: Hadoop order of operations

2011-08-05 Thread Harsh J
Premal, Didn't go through your entire thread, but the right order is: map (N) - partition (N) - combine (0…N). On Sat, Aug 6, 2011 at 4:04 AM, Premal premal.j.s...@gmail.com wrote: According to the attached image found on yahoo's hadoop tutorial, the order of operations is map combine

Re: java.io.IOException: config() IMP

2011-08-05 Thread jagaran das
Hi, I am using CDH3. I need to stream huge amount of data from our application to hadoop. I am opening a connection like config.set(fs.default.name,hdfsURI); FileSystem dfs = FileSystem.get(config); String path = hdfsURI + connectionKey; Path destPath = new Path(path); logger.debug(Path --  +