Although connection count may not be the root cause, please read
http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif
you have time.
0.92.0 would do a much better job of managing connections.

On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter <li...@infolinks.com> wrote:

> I will increase the number of connections to 1000.
>
> Thanks !
>
> Lior
>
>
>
>
> On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > The reason I asked about HBaseURLsDaysAggregator.java was that I see no
> > HBase (client) code in call stack.
> > I have little clue for the problem you experienced.
> >
> > There may be more than one connection to zookeeper from one map task.
> > So it doesn't hurt if you increase
> hbase.zookeeper.property.maxClientCnxns
> >
> > Cheers
> >
> > On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <li...@infolinks.com>
> > wrote:
> >
> > > 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 :
> > are
> > > not important since even when I removed all my map code the tasks got
> > stuck
> > > (but the thread dumps were generated after I revived the code). If you
> > > think
> > > its important I'll remove the map code again and re-generate the thread
> > > dumps...
> > >
> > > 2. 82 maps were launched but only 36 ran simultaneously.
> > >
> > > 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it
> ?
> > >
> > > Thanks,
> > > Lior
> > >
> > >
> > > On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > In the future, provide full dump using pastebin.com
> > > > Write snippet of log in email.
> > > >
> > > > Can you tell us what the following lines are about ?
> > > > HBaseURLsDaysAggregator.java:124
> > > > HBaseURLsDaysAggregator.java:131
> > > >
> > > > How many mappers were launched ?
> > > >
> > > > What value is used for hbase.zookeeper.property.maxClientCnxns ?
> > > > You may need to increase the value for above setting.
> > > >
> > > > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <li...@infolinks.com>
> > > > wrote:
> > > >
> > > > > I used kill -3, following the thread dump:
> > > > >
> > > > > ...
> > > > >
> > > > >
> > > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > >
> > > > > > I wasn't clear in my previous email.
> > > > > > It was not answer to why map tasks got stuck.
> > > > > > TableInputFormatBase.getSplits() is being called already.
> > > > > >
> > > > > > Can you try getting jstack of one of the map tasks before task
> > > tracker
> > > > > > kills
> > > > > > it ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <
> > li...@infolinks.com>
> > > > > > wrote:
> > > > > >
> > > > > > > 1. Currently every map gets one region. So I don't understand
> > what
> > > > > > > difference will it make using the splits.
> > > > > > > 2. How should I use the TableInputFormatBase.getSplits() ?
> Could
> > > not
> > > > > find
> > > > > > > examples for that.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Lior
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <yuzhih...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > For #2, see TableInputFormatBase.getSplits():
> > > > > > > >   * Calculates the splits that will serve as input for the
> map
> > > > tasks.
> > > > > > The
> > > > > > > >   * number of splits matches the number of regions in a
> table.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <
> > > > li...@infolinks.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1. yes - I configure my job using this line:
> > > > > > > > >
> > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
> > > > > > > scan,
> > > > > > > > > ScanMapper.class, Text.class, MapWritable.class, job)
> > > > > > > > >
> > > > > > > > > which internally uses TableInputFormat.class
> > > > > > > > >
> > > > > > > > > 2. One split per region ? What do you mean ? How do I do
> that
> > ?
> > > > > > > > >
> > > > > > > > > 3. hbase version 0.90.2
> > > > > > > > >
> > > > > > > > > 4. no exceptions. the logs are very clean.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <
> yuzhih...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Do you use TableInputFormat ?
> > > > > > > > > > To scan large number of rows, it would be better to
> produce
> > > one
> > > > > > Split
> > > > > > > > per
> > > > > > > > > > region.
> > > > > > > > > >
> > > > > > > > > > What HBase version do you use ?
> > > > > > > > > > Do you find any exception in master / region server logs
> > > around
> > > > > the
> > > > > > > > > moment
> > > > > > > > > > of timeout ?
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <
> > > > > > li...@infolinks.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > > I'm running a scan using the M/R framework.
> > > > > > > > > > > My table contains hundreds of millions of rows and I'm
> > > > scanning
> > > > > > > using
> > > > > > > > > > > start/stop key about 50 million rows.
> > > > > > > > > > >
> > > > > > > > > > > The problem is that some map tasks get stuck and the
> task
> > > > > manager
> > > > > > > > kills
> > > > > > > > > > > these maps after 600 seconds. When retrying the task
> > > > everything
> > > > > > > works
> > > > > > > > > > fine
> > > > > > > > > > > (sometimes).
> > > > > > > > > > >
> > > > > > > > > > > To verify that the problem is in hbase (and not in the
> > map
> > > > > code)
> > > > > > I
> > > > > > > > > > removed
> > > > > > > > > > > all the code from my map function, so it looks like
> this:
> > > > > > > > > > > public void map(ImmutableBytesWritable key, Result
> value,
> > > > > Context
> > > > > > > > > > context)
> > > > > > > > > > > throws IOException, InterruptedException {
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > Also, when the map got stuck on a region, I tried to
> scan
> > > > this
> > > > > > > region
> > > > > > > > > > > (using
> > > > > > > > > > > simple scan from a Java main) and it worked fine.
> > > > > > > > > > >
> > > > > > > > > > > Any ideas ?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Lior
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to