Re: Multiple Output Formats

2011-07-26 Thread Ayon Sinha
package com.shopkick.util; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat; public class MultiFileOutput extends MultipleTextOutputFormat { @Override protected String generateFileNameForKeyValue(Text key, Text value, String name)

Re: Exporting From Hive

2011-07-28 Thread Ayon Sinha
This is for CLI Use this: set hive.cli.print.header=true; Instead of doing this on the prompt everytime you can change your hive start command to: hive -hiveconf hive.cli.print.header=true But be careful with this setting as quite a few commands stop working with NPE with this on. I thin

Re: Reducer to concatenate string values

2011-09-19 Thread Ayon Sinha
What are you using for your map/reduce? Streaming/Java/Pig/Hive?   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Daniel Yehdego To: common-user@hadoop.apache.org Sent: Monday, September 19, 2011 10:43 PM Subj

Re: Reducer to concatenate string values

2011-09-20 Thread Ayon Sinha
Hi Daniel, There are ways to do what you are asking for, but are you sure you are using the right framework for the right problem? Hadoop's premise is that Reducer values are not sorted. If you want all values to be concatenated in a sorted order one way is to read in all values in the reducer (

Re: why one of the reducers it's always slower?

2011-10-22 Thread Ayon Sinha
Looks like that is the reducer who is actually doing the work with 14M input records.  Reduce input groups 1 Combine output records 0 Reduce shuffle bytes 5,135,004,496 Reduce output records 14,232,592 Spilled Records 14,232,592 Combine input records 0 Reduce input records 14,232,59

Re: HBase Stack

2011-11-15 Thread Ayon Sinha
I believe one of the biggest problem you will face with HBase in a small setup is that MySQL is happy with single machine setup (less maintenance headache for small scale projects) compared to HBase running in pseudo-ditrib mode. In the pseudo-distib mode single HBase machine will have too much

Re: HBase Stack

2011-11-15 Thread Ayon Sinha
don't know whether a good advice would be to take a refactoring >> of the data-management into consideration, if HBase is no choice for a >> single-server-project (not even in the beginning). >> >> Regards, >> Em >> >> Am 15.11.2011 19:08, schrieb Travis Camec

Re: simple question : where is conf/hadoop-env.sh ?

2011-11-17 Thread Ayon Sinha
$HADOOP_HOME/conf   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Jay Vyas To: common-user@hadoop.apache.org Sent: Thursday, November 17, 2011 10:40 AM Subject: simple question : where is conf/hadoop-env.sh ?

Re: Matrix multiplication in Hadoop

2011-11-18 Thread Ayon Sinha
I'd really be interested in a comparison of Numpy/Octave/Matlab kind of tools with a Hadoop (lets say 4-10 large cloud servers) implementation with growing size of the matrix. I want to know the scale at which Hadoop really starts to pull away.    -Ayon See My Photos on Flickr Also check out my

Re: Hadoop Cluster Quick Setup Script

2011-12-03 Thread Ayon Sinha
Some nice guys at HortonWorks told me yesterday about Apache Whirr. Do you think this will help you? http://whirr.apache.org/docs/0.6.0/quick-start-guide.html   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From:

Re: problem with streaming and libjars

2011-06-16 Thread Ayon Sinha
Blog for answers to commonly asked questions. From: Joey Echeverria To: common-user@hadoop.apache.org; Ayon Sinha Sent: Thursday, June 16, 2011 6:06 AM Subject: Re: problem with streaming and libjars I would try the following: hadoop -libjars /home/ayon/jars