The reason I asked about HBaseURLsDaysAggregator.java was that I see no
HBase (client) code in call stack.
I have little clue for the problem you experienced.

There may be more than one connection to zookeeper from one map task.
So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns

Cheers

On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <li...@infolinks.com> wrote:

> 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are
> not important since even when I removed all my map code the tasks got stuck
> (but the thread dumps were generated after I revived the code). If you
> think
> its important I'll remove the map code again and re-generate the thread
> dumps...
>
> 2. 82 maps were launched but only 36 ran simultaneously.
>
> 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?
>
> Thanks,
> Lior
>
>
> On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > In the future, provide full dump using pastebin.com
> > Write snippet of log in email.
> >
> > Can you tell us what the following lines are about ?
> > HBaseURLsDaysAggregator.java:124
> > HBaseURLsDaysAggregator.java:131
> >
> > How many mappers were launched ?
> >
> > What value is used for hbase.zookeeper.property.maxClientCnxns ?
> > You may need to increase the value for above setting.
> >
> > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <li...@infolinks.com>
> > wrote:
> >
> > > I used kill -3, following the thread dump:
> > >
> > > ...
> > >
> > >
> > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > I wasn't clear in my previous email.
> > > > It was not answer to why map tasks got stuck.
> > > > TableInputFormatBase.getSplits() is being called already.
> > > >
> > > > Can you try getting jstack of one of the map tasks before task
> tracker
> > > > kills
> > > > it ?
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <li...@infolinks.com>
> > > > wrote:
> > > >
> > > > > 1. Currently every map gets one region. So I don't understand what
> > > > > difference will it make using the splits.
> > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could
> not
> > > find
> > > > > examples for that.
> > > > >
> > > > > Thanks,
> > > > > Lior
> > > > >
> > > > >
> > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > >
> > > > > > For #2, see TableInputFormatBase.getSplits():
> > > > > >   * Calculates the splits that will serve as input for the map
> > tasks.
> > > > The
> > > > > >   * number of splits matches the number of regions in a table.
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <
> > li...@infolinks.com>
> > > > > > wrote:
> > > > > >
> > > > > > > 1. yes - I configure my job using this line:
> > > > > > >
> > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
> > > > > scan,
> > > > > > > ScanMapper.class, Text.class, MapWritable.class, job)
> > > > > > >
> > > > > > > which internally uses TableInputFormat.class
> > > > > > >
> > > > > > > 2. One split per region ? What do you mean ? How do I do that ?
> > > > > > >
> > > > > > > 3. hbase version 0.90.2
> > > > > > >
> > > > > > > 4. no exceptions. the logs are very clean.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <yuzhih...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Do you use TableInputFormat ?
> > > > > > > > To scan large number of rows, it would be better to produce
> one
> > > > Split
> > > > > > per
> > > > > > > > region.
> > > > > > > >
> > > > > > > > What HBase version do you use ?
> > > > > > > > Do you find any exception in master / region server logs
> around
> > > the
> > > > > > > moment
> > > > > > > > of timeout ?
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <
> > > > li...@infolinks.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > > I'm running a scan using the M/R framework.
> > > > > > > > > My table contains hundreds of millions of rows and I'm
> > scanning
> > > > > using
> > > > > > > > > start/stop key about 50 million rows.
> > > > > > > > >
> > > > > > > > > The problem is that some map tasks get stuck and the task
> > > manager
> > > > > > kills
> > > > > > > > > these maps after 600 seconds. When retrying the task
> > everything
> > > > > works
> > > > > > > > fine
> > > > > > > > > (sometimes).
> > > > > > > > >
> > > > > > > > > To verify that the problem is in hbase (and not in the map
> > > code)
> > > > I
> > > > > > > > removed
> > > > > > > > > all the code from my map function, so it looks like this:
> > > > > > > > > public void map(ImmutableBytesWritable key, Result value,
> > > Context
> > > > > > > > context)
> > > > > > > > > throws IOException, InterruptedException {
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Also, when the map got stuck on a region, I tried to scan
> > this
> > > > > region
> > > > > > > > > (using
> > > > > > > > > simple scan from a Java main) and it worked fine.
> > > > > > > > >
> > > > > > > > > Any ideas ?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Lior
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to