Re: How to best decide mapper output/reducer input for a huge string?

2013-09-29 Thread Jens Scheidtmann
Dear Pavan, If it was working well, runtime would be shorter. What makes you sure this is Hbase or Hadoop related? What percentage of time is spent in your algorithms? Use System.getTimeMillies() and run your program on the first 100,000 Records single threaded and print to stdout. See were time

RE: How to best decide mapper output/reducer input for a huge string?

2013-09-23 Thread Pavan Sudheendra
*** > > John > > ** ** > > ** ** > > *From:* Pavan Sudheendra [mailto:pavan0...@gmail.com] > *Sent:* Monday, September 23, 2013 3:31 AM > *To:* user@hadoop.apache.org > *Subject:* Re: How to best decide mapper output/reducer input for a huge > string? > > **

RE: How to best decide mapper output/reducer input for a huge string?

2013-09-23 Thread John Lilley
M To: user@hadoop.apache.org Subject: Re: How to best decide mapper output/reducer input for a huge string? @John, to be really frank i don't know what the limiting factor is.. It might be all of them or a subset of them.. Cannot tell.. On Mon, Sep 23, 2013 at 2:58 PM, Pavan Sudheendra mailto:pava

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-23 Thread Pavan Sudheendra
en you can recreate the tables with predefined >>> splits to create more regions. >>> >>> Thanks, >>> Rahul >>> >>> >>> On Sun, Sep 22, 2013 at 4:38 AM, John Lilley >>> wrote: >>> >>>> Pavan,**** >

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-23 Thread Pavan Sudheendra
013 at 4:38 AM, John Lilley wrote: >> >>> Pavan, >>> >>> How large are the rows in HBase? 22 million rows is not very much but >>> you mentioned “huge strings”. Can you tell which part of the processing is >>> the limiting factor (read

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-22 Thread Pradeep Gollakota
u mentioned “huge strings”. Can you tell which part of the processing is >> the limiting factor (read from HBase, mapper output, reducers)? >> >> John >> >> ** ** >> >> ** ** >> >> *From:* Pavan Sudheendra [mailto:pavan0...@gmail.com]

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-22 Thread Rahul Bhattacharjee
rg > *Subject:* Re: How to best decide mapper output/reducer input for a huge > string? > > ** ** > > No, I don't have a combiner in place. Is it necessary? How do I make my > map output compressed? Yes, the Tables in HBase are compressed. > > Although, there

RE: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread John Lilley
: Saturday, September 21, 2013 2:17 AM To: user@hadoop.apache.org Subject: Re: How to best decide mapper output/reducer input for a huge string? No, I don't have a combiner in place. Is it necessary? How do I make my map output compressed? Yes, the Tables in HBase are compressed. Although,

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pavan Sudheendra
No, I don't have a combiner in place. Is it necessary? How do I make my map output compressed? Yes, the Tables in HBase are compressed. Although, there's no real bottleneck, the time it takes to process the entire table is huge. I have to constantly check if i can optimize it somehow.. Oh okay..

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pradeep Gollakota
One thing that comes to mind is that your keys are Strings which are highly inefficient. You might get a lot better performance if you write a custom writable for your Key object using the appropriate data types. For example, use a long (LongWritable) for timestamps. This should make (de)serializat

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-21 Thread Pavan Sudheendra
Hi Pradeep, Yes.. Basically i'm only writing the key part as the map output.. The V of is not of much use to me.. But i'm hoping to change that if it leads to faster execution.. I'm kind of a newbie so looking to make the map/reduce job run a lot faster.. Also, yes. It gets sorted by the HouseHol

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-20 Thread Pradeep Gollakota
I'm sorry but I don't understand your question. Is the output of the mapper you're describing the key portion? If it is the key, then your data should already be sorted by HouseHoldId since it occurs first in your key. The SortComparator will tell Hadoop how to sort your data. So you use this if y

How to best decide mapper output/reducer input for a huge string?

2013-09-20 Thread Pavan Sudheendra
I need to improve my MR jobs which uses HBase as source as well as sink.. Basically, i'm reading data from 3 HBase Tables in the mapper, writing them out as one huge string for the reducer to do some computation and dump into a HBase Table.. Table1 ~ 19 million rows.Table2 ~ 2 million rows.Table3