Re: Seekable interface and CompressInputStream question

2012-12-21 Thread Harsh J
Seekable interface isn't used to detect for splittable compressed files. I've also not seen it be implemented properly in any of the codecs in trunk at least today (with Bzip2, being the only natively splittable one, too not implementing a seek function). I don't think we support seeking yet on a c

Re: Is it possible to run from localized directory instead of jar?

2012-12-21 Thread Harsh J
Are you looking for the DistributedCache's archives feature? If you add a 'archive' type to the cache, it automatically extracts it onto the current working directory. See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html "Archives (zip, tar and tgz/tar.

How to troubleshoot OutOfMemoryError

2012-12-21 Thread David Parks
I'm pretty consistently seeing a few reduce tasks fail with OutOfMemoryError (below). It doesn't kill the job, but it slows it down. In my current case the reducer is pretty darn simple, the algorithm basically does: 1. Do you have 2 values for this key? 2. If so, build a json str

Is it possible to run from localized directory instead of jar?

2012-12-21 Thread Ilya Kirnos
When running hadoop locally, RunJar will unjar the job jar and use the localized directory as the classpath to run the job. When running distributed, it seems the localized directory is created, but the jar is used for the classpath instead, and the localized directory is ignored for classpath pur

Re: What should I do with a 48-node cluster

2012-12-21 Thread Edward Capriolo
Three year old blade center is ok. A three year old blade is probably a 64 bit machine. 2 to 4 gb RAM 2 SCSI disks. Maybe two socket two core. Two blade centers is about 8u or a quarter cabinet and you can find a hosting provider in your price range. Especially if you can get the hardware at a low

Re: Hadoop example command.

2012-12-21 Thread Rishi Yadav
Hi Sitaraman, Can you tell what exactly this command is doing. bin/hadoop jar -v hadoop-examples-0.20.203.0.jar grep input output 'dfs[a-z]+' general format is -> hadoop jar library main_class input output Thanks and Regards, Rishi Yadav On Thu, Dec 20, 2012 at 10:12 PM, Ramachandran Vil

Re: build-truck-branch/ version control / POM / check out /git / maven / ant

2012-12-21 Thread Rishi Yadav
checking out sources also help if you have habit of looking under the hood like me. It gives you great insight into the framework you are using. Thanks and Regards, Rishi Yadav (o) 408.988.2000x113 || (f) 408.716.2726 InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)* *INC 5

Re: Test failures on a released version

2012-12-21 Thread Rishi Yadav
Hi Mark, Your issue seems to be related to this jira item https://issues.apache.org/jira/browse/HDFS-3902 This issue has been fixed on 9/12 (courtesy Andy Isaacson). I would selectively update this test from svn if this is the only one bugging me. Thanks and Regards, Rishi Yadav (o) 408.988.2

Re: build-truck-branch/ version control / POM / check out /git / maven / ant

2012-12-21 Thread Glen Mazza
The git and check out is for when you're interested in submitting patches (improvements) to the source code--then you must have the very latest and greatest (to-the-minute) on your machine. If you're not, then just download the regular gzipped file and work with that. Glen On 12/18/2012 10:5

Re: What should I do with a 48-node cluster

2012-12-21 Thread Mark Kerzner
True! I am thinking of either my (small) office, or actually hosting for under $500/month. On Fri, Dec 21, 2012 at 1:37 PM, Lance Norskog wrote: > You will also be raided by the DEA- too much power for a residence. > > > On 12/20/2012 07:56 AM, Ted Dunning wrote: > > > > > On Thu, Dec 20, 2012

Re: What should I do with a 48-node cluster

2012-12-21 Thread Lance Norskog
You will also be raided by the DEA- too much power for a residence. On 12/20/2012 07:56 AM, Ted Dunning wrote: On Thu, Dec 20, 2012 at 7:38 AM, Michael Segel mailto:michael_se...@hotmail.com>> wrote: While Ted ignores that the world is going to end before X-Mas, he does hit the cru

Re: Hadoop

2012-12-21 Thread Nitin Pawar
Hadoop is composed of two things ...one is file system mainly hdfs and another is data processing framework (MapReduce) Before you go to Big Data, first read about what is hadoop and what are its usecases then get into some basic hadoop training and worry later how things work On Dec 21, 2012 1

Re: Question about HA and Federation

2012-12-21 Thread ESGLinux
Thank you Harsh J. first point checked (decide what to do) now I have to do it ;-) Kind regards, ESGLinux 2012/12/21 Harsh J > Appears alright to me! > > On Fri, Dec 21, 2012 at 1:15 PM, ESGLinux wrote: > > Hi, > > > > Finally I´m going to try this: > > > > 1 Machine: Active Name Node for NS

Re: Question about HA and Federation

2012-12-21 Thread Harsh J
Appears alright to me! On Fri, Dec 21, 2012 at 1:15 PM, ESGLinux wrote: > Hi, > > Finally I´m going to try this: > > 1 Machine: Active Name Node for NS1 > 1 Machine: Passive Name Node for NS1 > 1 Machine: NameNode for NS2 + NameNode for NS3 > 1 Machine: Secondary NameNode for NS2 + Secondary Name