Dear Pavan,
If it was working well, runtime would be shorter. What makes you sure this
is Hbase or Hadoop related? What percentage of time is spent in your
algorithms?
Use System.getTimeMillies() and run your program on the first 100,000
Records single threaded and print to stdout. See were time
***
>
> John
>
> ** **
>
> ** **
>
> *From:* Pavan Sudheendra [mailto:pavan0...@gmail.com]
> *Sent:* Monday, September 23, 2013 3:31 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to best decide mapper output/reducer input for a huge
> string?
>
> **
M
To: user@hadoop.apache.org
Subject: Re: How to best decide mapper output/reducer input for a huge string?
@John, to be really frank i don't know what the limiting factor is.. It might
be all of them or a subset of them.. Cannot tell..
On Mon, Sep 23, 2013 at 2:58 PM, Pavan Sudheendra
mailto:pava
en you can recreate the tables with predefined
>>> splits to create more regions.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Sun, Sep 22, 2013 at 4:38 AM, John Lilley
>>> wrote:
>>>
>>>> Pavan,****
>
013 at 4:38 AM, John Lilley wrote:
>>
>>> Pavan,
>>>
>>> How large are the rows in HBase? 22 million rows is not very much but
>>> you mentioned “huge strings”. Can you tell which part of the processing is
>>> the limiting factor (read
u mentioned “huge strings”. Can you tell which part of the processing is
>> the limiting factor (read from HBase, mapper output, reducers)?
>>
>> John
>>
>> ** **
>>
>> ** **
>>
>> *From:* Pavan Sudheendra [mailto:pavan0...@gmail.com]
rg
> *Subject:* Re: How to best decide mapper output/reducer input for a huge
> string?
>
> ** **
>
> No, I don't have a combiner in place. Is it necessary? How do I make my
> map output compressed? Yes, the Tables in HBase are compressed.
>
> Although, there
: Saturday, September 21, 2013 2:17 AM
To: user@hadoop.apache.org
Subject: Re: How to best decide mapper output/reducer input for a huge string?
No, I don't have a combiner in place. Is it necessary? How do I make my map
output compressed? Yes, the Tables in HBase are compressed.
Although,
No, I don't have a combiner in place. Is it necessary? How do I make my map
output compressed? Yes, the Tables in HBase are compressed.
Although, there's no real bottleneck, the time it takes to process the
entire table is huge. I have to constantly check if i can optimize it
somehow..
Oh okay..
One thing that comes to mind is that your keys are Strings which are highly
inefficient. You might get a lot better performance if you write a custom
writable for your Key object using the appropriate data types. For example,
use a long (LongWritable) for timestamps. This should make
(de)serializat
Hi Pradeep,
Yes.. Basically i'm only writing the key part as the map output.. The V of
is not of much use to me.. But i'm hoping to change that if it leads
to faster execution.. I'm kind of a newbie so looking to make the
map/reduce job run a lot faster..
Also, yes. It gets sorted by the HouseHol
I'm sorry but I don't understand your question. Is the output of the mapper
you're describing the key portion? If it is the key, then your data should
already be sorted by HouseHoldId since it occurs first in your key.
The SortComparator will tell Hadoop how to sort your data. So you use this
if y
I need to improve my MR jobs which uses HBase as source as well as sink..
Basically, i'm reading data from 3 HBase Tables in the mapper, writing them
out as one huge string for the reducer to do some computation and dump into
a HBase Table..
Table1 ~ 19 million rows.Table2 ~ 2 million rows.Table3
13 matches
Mail list logo