Building Mahout Issue

2014-06-02 Thread Botelho, Andrew
I am trying to build Mahout version 0.9 and make it compatible with Hadoop 2.4.0. I unpacked mahout-distribution-0.9-src.tar.gz and then ran the following command: mvn -Phadoop-0.23 clean install -Dhadoop.version=2.4.0 -DskipTests Then I get the following error: [ERROR] Failed to execute goal

Getting HBaseStorage() to work in Pig

2013-08-23 Thread Botelho, Andrew
I am trying to use the function HBaseStorage() in my Pig code in order to load an HBase table into Pig. When I run my code, I get this error: ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable I believe the PIG_CLASSPATH needs to be extended to

RE: Getting HBaseStorage() to work in Pig

2013-08-23 Thread Botelho, Andrew
...@gmail.com] Sent: Friday, August 23, 2013 4:50 PM To: common-u...@hadoop.apache.org Subject: Re: Getting HBaseStorage() to work in Pig Please look at the example in 15.1.1 under http://hbase.apache.org/book.html#tools On Fri, Aug 23, 2013 at 1:41 PM, Botelho, Andrew andrew.bote

RE: DistributedCache incompatibility issue between 1.0 and 2.0

2013-07-19 Thread Botelho, Andrew
I have been using Job.addCacheFile() to cache files in the distributed cache. It has been working for me on Hadoop 2.0.5: public void addCacheFile(URI uri) Add a file to be localized Parameters: uri - The uri of the cache to be localized -Original Message- From: Edward J. Yoon

Make job output be a comma separated file

2013-07-18 Thread Botelho, Andrew
What is the best way to make the output of my Hadoop job be comma separated? Basically, how can I have the keys and values be separated by a comma? My keys are Text objects, and some of them have actual commas within the field. Will this matter? Thanks, Andrew

RE: Make job output be a comma separated file

2013-07-18 Thread Botelho, Andrew
is the value. Regards Ravi M. On Thu, Jul 18, 2013 at 10:46 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: What is the best way to make the output of my Hadoop job be comma separated? Basically, how can I have the keys and values be separated by a comma? My keys are Text

RE: Make job output be a comma separated file

2013-07-18 Thread Botelho, Andrew
.. I noticed that in Hadoop 1.0.4 , the class org.apache.hadoop.mapreduce.lib.output.TextOutputFormat is looking for mapred.textoutputformat.separator . Regards Ravi M On Thu, Jul 18, 2013 at 11:32 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: I believe

RE: Make job output be a comma separated file

2013-07-18 Thread Botelho, Andrew
at 11:32 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: I believe that mapred.textoutputformat.separator is from the old API, but now the field is mapreduce.output.textoutputformat.separator in the new API. So I ran this code in my driver class, but it is making

RE: New Distributed Cache

2013-07-11 Thread Botelho, Andrew
JobContext.getCacheFiles() ? Thanks, Omkar Joshi Hortonworks Inc.http://www.hortonworks.com On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: Hi, I am trying to store a file in the Distributed Cache during my Hadoop job. In the driver class, I tell

RE: CompositeInputFormat

2013-07-11 Thread Botelho, Andrew
-in-hadoop-using-compositeinputformat/ the trick is to google for CompositeInputFormat.compose() :) On Thu, Jul 11, 2013 at 5:02 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: Hi, I want to perform a JOIN on two sets of data with Hadoop. I read that the class

RE: Distributed Cache

2013-07-10 Thread Botelho, Andrew
@hadoop.apache.org Subject: Re: Distributed Cache You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version

New Distributed Cache

2013-07-10 Thread Botelho, Andrew
Hi, I am trying to store a file in the Distributed Cache during my Hadoop job. In the driver class, I tell the job to store the file in the cache with this code: Job job = Job.getInstance(); job.addCacheFile(new URI(file name)); That all compiles fine. In the Mapper code, I try accessing the

RE: Distributed Cache

2013-07-10 Thread Botelho, Andrew
...@gmail.commailto:yuzhih...@gmail.com] Sent: Tuesday, July 09, 2013 6:08 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Distributed Cache You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote

Distributed Cache

2013-07-09 Thread Botelho, Andrew
Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import