Re: Question on Hadoop Streaming

2011-12-06 Thread Brock Noland
Does you job end with an error? I am guessing what you want is: -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh' First option says use your script as a mapper and second says ship your script as part of the job. Brock On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler ro...@ormium.de

Re: Automate Hadoop installation

2011-12-06 Thread Praveen Sripati
Also, checkout Ambari (http://incubator.apache.org/ambari/) which is still in the Incubator status. How does Ambari and Puppet compare? Regards, Praveen On Tue, Dec 6, 2011 at 1:00 PM, alo alt wget.n...@googlemail.com wrote: Hi, to deploy software I suggest pulp:

Re: Question on Hadoop Streaming

2011-12-06 Thread Romeo Kienzler
Hi Brock, I'm not getting any errors. I'm issuing the following command now: hadoop jar hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.1 -output

Re: Multiple Mappers for Multiple Tables

2011-12-06 Thread Praveen Sripati
MultipleInputs take multiple Path (files) and not DB as input. As mentioned earlier export tables into HDFS either using Sqoop or native DB export tool and then do the processing. Sqoop is configured to use native DB export tool whenever possible. Regards, Praveen On Tue, Dec 6, 2011 at 3:44 AM,

Re: Running a job continuously

2011-12-06 Thread Praveen Sripati
If the requirement is for real time data processing, using Flume will not suffice as there is a time lag between the collection of files by Flume and processing done by Hadoop. Consider frameworks like S4, Storm (from Twitter), HStreaming etc which suits realtime processing. Regards, Praveen On

RE: Multiple Mappers for Multiple Tables

2011-12-06 Thread Devaraj K
Hi Justin, If it is not feasible for you to do as praveen suggested, here you can go. 1. You can write customized InputFormat which can create different connections for different data sources and returns splits from those data source tables. Internally you can use DBInputFormat for each data

Hadoop 0.21

2011-12-06 Thread Saurabh Sehgal
Hi All, According to the Hadoop release notes, version 0.21.0 should not be considered stable or suitable for production: 23 August, 2010: release 0.21.0 available This release contains many improvements, new features, bug fixes and optimizations. It has not undergone testing at scale and should

Re: Hadoop 0.21

2011-12-06 Thread Jean-Daniel Cryans
Yep. J-D On Tue, Dec 6, 2011 at 10:41 AM, Saurabh Sehgal saurabh@gmail.com wrote: Hi All, According to the Hadoop release notes, version 0.21.0 should not be considered stable or suitable for production: 23 August, 2010: release 0.21.0 available This release contains many

Version of Hadoop That Will Work With HBase?

2011-12-06 Thread jcfolsom
Hi, Can someone please tell me which versions of hadoop contain the 20-appender code and will work with HBase? According to the Hbase docs (http://hbase.apache.org/book/hadoop.html), Hadoop 0.20.205 should work with HBase but it does not appear to. Thanks!

Re: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread Harsh J
0.20.205 should work, and so should CDH3 or 0.20-append branch builds (no longer maintained, after 0.20.205 replaced it though). What problem are you facing? Have you ensured HBase does not have a bad hadoop version jar in its lib/? On Wed, Dec 7, 2011 at 12:55 AM, jcfol...@pureperfect.com

Re: MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)

2011-12-06 Thread Chris Curtin
Thanks guys, I'll get with operations to do the upgrade. Chris On Mon, Dec 5, 2011 at 4:11 PM, Bejoy Ks bejoy.had...@gmail.com wrote: Hi Chris From the stack trace, it looks like a JVM corruption issue. It is a known issue and have been fixed in CDH3u2, i believe an upgrade would

RE: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread jcfolsom
Sadly, CDH3 is not an option although I wish it was. I need to get an official release of HBase from apache to work. I've tried every version of HBase 0.89 and up with 0.20.205 and all of them throw EOFExceptions. Which version of Hadoop core should I be using? HBase 0.94 ships with a 20-append

RE: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread jcfolsom
Sadly, CDH3 is not an option although I wish it was. I need to get an official release of HBase from apache to work. I've tried every version of HBase 0.89 and up with 0.20.205 and all of them throw EOFExceptions. Which version of Hadoop core should I be using? HBase 0.94 ships with a 20-append

Re: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread Jean-Daniel Cryans
For the record, this thread was started from another discussion in user@hbase. 0.20.205 does work with HBase 0.90.4, I think the OP was a little too quick saying it doesn't. J-D On Tue, Dec 6, 2011 at 11:44 AM, jcfol...@pureperfect.com wrote: Sadly, CDH3 is not an option although I wish it

Re: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread Jitendra Pandey
Did you set dfs.support.append to true? It is not enabled by default in 0.20.205 (unlike 20.append) On Tue, Dec 6, 2011 at 11:25 AM, jcfol...@pureperfect.com wrote: Hi, Can someone please tell me which versions of hadoop contain the 20-appender code and will work with HBase? According to

Re: Splitting SequenceFile in controlled manner

2011-12-06 Thread Harsh J
Majid, Sync markers are written into sequence files already, they are part of the format. This is nothing to worry about - and is simple enough to test and be confident about. The mechanism is same as reading a text file with newlines - the reader will ensure reading off the boundary data in

RE: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread jcfolsom
Yes. From what I have read, it needs to be set in both hdfs-site.xml and in hbase-site.xml. It's working now. I don't really know why, but it is. Thanks! Original Message Subject: Re: Version of Hadoop That Will Work With HBase? From: Jitendra Pandey

Re: Splitting SequenceFile in controlled manner

2011-12-06 Thread Majid Azimi
So if we have a map job analysing only the second block of the log file, it should not transfer any other parts of that from other nodes because that part is stand alone and meaning full split? Am I right? On Tue, Dec 6, 2011 at 11:32 PM, Harsh J ha...@cloudera.com wrote: Majid, Sync markers

Re: Hadoop 0.21

2011-12-06 Thread Rita
I second Vinod´s idea. Get the latest stable from Cloudera. Their binaries are near perfect! On Tue, Dec 6, 2011 at 1:46 PM, T Vinod Gupta tvi...@readypulse.com wrote: Saurabh, Its best if you go through the hbase book - Lars George's book HBase the Definitive Guide. Your best bet is to

Re: Question on Hadoop Streaming

2011-12-06 Thread Romeo Kienzler
Hi, the following command works: hadoop jar hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input input -output output2 -mapper /root/bowtiestreaming.sh -reducer NONE Best Regards, Romeo On 12/06/2011 10:49 AM, Brock Noland wrote: Does you job end with an error? I am

HDFS Backup nodes

2011-12-06 Thread praveenesh kumar
Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh