About the performance of job execution on Amazon EMR

2012-05-05 Thread Bhavesh Shah
Hello, As we increase the number of mappers and decrease reducers to less number, does performance increase? I have never played with setting the number mapper and reducer and I don't know how to set it. But in case of multiple nodes then how much do I need the set the mappers and reducers accord

Re: Rank implementation in hive

2012-05-05 Thread wd
Maybe someone can setup a site to accept user upload UDF or UDAF jars. Like cpan for perl, aur for archlinux. :D On Sun, May 6, 2012 at 1:28 AM, Edward Capriolo wrote: > Hey all, > > We (m6d.com) have released a implementation of the rank feature to github: > > https://github.com/edwardcapriolo/h

Re: Problem writing SerDe to read Nutch crawldb because Hive seems to ignore the Key and only reads the Value from SequenceFiles.

2012-05-05 Thread Edward Capriolo
Stored as sequence file is syntax sugar. It sets both the inputformat and outputformat. Create table x (thing int) Inputformat 'class.x' Outputformat 'class.y' For inputformat you can use your custom. For your output format you can stick with hive's ignorekeytextoutputformat or ignorekeysequence

Re: '\N' is displayed in case of null column values in exporting hive query results to CSV file

2012-05-05 Thread Mark Grover
Hi Rinku, That's the expected behaviour to differentiate between NULL, "NULL" (string) and "" (empty string). If you would like something else to be serialized, you can use coalesce UDF (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) Say, you wanted to serialize empty stri

Re: Hive supports scheduling?

2012-05-05 Thread Mark Grover
Hi Chandan, None that I know of. If I were you, I would schedule my client code to issue a query on Hive at a particular time. Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com - Original Message - From: "Chandan B.K" To: user@hive.apa

Re: Problem writing SerDe to read Nutch crawldb because Hive seems to ignore the Key and only reads the Value from SequenceFiles.

2012-05-05 Thread Ali Safdar Kureishy
Thanks Edward...I feared this was going to be the case. If I define a new input format, how do I use it in a hive table definition? For the SequenceFileInputFormat, the table definition would read as "...STORED AS SEQUENCEFILE". With the new one, how do I specify it in the definition? "STORED AS

Re: Problem writing SerDe to read Nutch crawldb because Hive seems to ignore the Key and only reads the Value from SequenceFiles.

2012-05-05 Thread Edward Capriolo
This is one of the things about hive the key is not easily available. You are going to need an input format that creates a new value which is contains the key and the value. Like this: -> new MyKeyValue< > On Sat, May 5, 2012 at 4:05 PM, Ali Safdar Kureishy wrote: > Hi, > > I have attached

Rank implementation in hive

2012-05-05 Thread Edward Capriolo
Hey all, We (m6d.com) have released a implementation of the rank feature to github: https://github.com/edwardcapriolo/hive-rank We have a nice description of how it works in our blog: http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive We are doing some extra-credit

Re: Hive supports scheduling?

2012-05-05 Thread Jagat
Have you checked oozie hive action. --- Sent from Mobile On 05-May-2012 6:09 PM, "Chandan B.K" wrote: > Hi , > Does Hive internally has any scheduling feature. Does latest releases > of Hive expose any API's to schedule a query to fire at a particular time? > Thanks > > -- > > -Rega

Hive supports scheduling?

2012-05-05 Thread Chandan B.K
Hi , Does Hive internally has any scheduling feature. Does latest releases of Hive expose any API's to schedule a query to fire at a particular time? Thanks -- -Regards Chandan.B.K., == Cell: +91-9902382263 Al