Thanks all of you, and your answer help me a lot.
2018-03-19 22:31 GMT+08:00 Saad Mufti :
> Another option if you have enough disk space/off heap memory space is to
> enable bucket cache to cache even more of your data, and set the
> PREFETCH_ON_OPEN => true option on the column families you wan
Another option if you have enough disk space/off heap memory space is to
enable bucket cache to cache even more of your data, and set the
PREFETCH_ON_OPEN => true option on the column families you want always
cache. That way HBase will prefetch your data into the bucket cache and
your scan won't ha
Hi
First regarding the scans,
Generally the data resides in the store files which is in HDFS. So probably
the first scan that you are doing is reading from HDFS which involves disk
reads. Once the blocks are read, they are cached in the Block cache of
HBase. So your further reads go through that
Hello everyone
I try to do many Scan use RegionScanner in coprocessor, and ervery
time ,the first Scan cost about 10 times than the other,
I don't know why this will happen
OneBucket Scan cost is : 8794 ms Num is : 710
OneBucket Scan cost is : 91 ms Num is : 776
OneBucket Scan cost is :
Did a quick trim...
Sorry to jump in on the tail end of this...
Two things you may want to look at...
Are you timing out because you haven't updated your status within the task or
are you taking 600seconds to complete a single map() iteration.
You can test this by tracking to see how long you a
Although connection count may not be the root cause, please read
http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif
you have time.
0.92.0 would do a much better job of managing connections.
On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter wrote:
> I will increase t
I will increase the number of connections to 1000.
Thanks !
Lior
On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu wrote:
> The reason I asked about HBaseURLsDaysAggregator.java was that I see no
> HBase (client) code in call stack.
> I have little clue for the problem you experienced.
>
> There may b
>From master UI, click 'zk dump'
:60010/zk.jsp would show you the active connections. See if the count
reaches 300 when map tasks run.
On Mon, Jul 4, 2011 at 10:12 AM, Ted Yu wrote:
> The reason I asked about HBaseURLsDaysAggregator.java was that I see no
> HBase (client) code in call stack.
> I
The reason I asked about HBaseURLsDaysAggregator.java was that I see no
HBase (client) code in call stack.
I have little clue for the problem you experienced.
There may be more than one connection to zookeeper from one map task.
So it doesn't hurt if you increase hbase.zookeeper.property.maxClient
1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are
not important since even when I removed all my map code the tasks got stuck
(but the thread dumps were generated after I revived the code). If you think
its important I'll remove the map code again and re-generate the threa
In the future, provide full dump using pastebin.com
Write snippet of log in email.
Can you tell us what the following lines are about ?
HBaseURLsDaysAggregator.java:124
HBaseURLsDaysAggregator.java:131
How many mappers were launched ?
What value is used for hbase.zookeeper.property.maxClientCnxn
I used kill -3, following the thread dump:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode):
"IPC Client (47) connection to /127.0.0.1:59759 from hadoop" daemon
prio=10 tid=0x2aaab05ca800 nid=0x4eaf in Object.wait()
[0x403c1000]
java.lang.Thread.State: TIMED
I wasn't clear in my previous email.
It was not answer to why map tasks got stuck.
TableInputFormatBase.getSplits() is being called already.
Can you try getting jstack of one of the map tasks before task tracker kills
it ?
Thanks
On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter wrote:
> 1. Curre
1. Currently every map gets one region. So I don't understand what
difference will it make using the splits.
2. How should I use the TableInputFormatBase.getSplits() ? Could not find
examples for that.
Thanks,
Lior
On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu wrote:
> For #2, see TableInputFormatBas
For #2, see TableInputFormatBase.getSplits():
* Calculates the splits that will serve as input for the map tasks. The
* number of splits matches the number of regions in a table.
On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter wrote:
> 1. yes - I configure my job using this line:
> TableMa
1. yes - I configure my job using this line:
TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan,
ScanMapper.class, Text.class, MapWritable.class, job)
which internally uses TableInputFormat.class
2. One split per region ? What do you mean ? How do I do that ?
3. hbase versio
Do you use TableInputFormat ?
To scan large number of rows, it would be better to produce one Split per
region.
What HBase version do you use ?
Do you find any exception in master / region server logs around the moment
of timeout ?
Cheers
On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter wrote:
>
Hi all,
I'm running a scan using the M/R framework.
My table contains hundreds of millions of rows and I'm scanning using
start/stop key about 50 million rows.
The problem is that some map tasks get stuck and the task manager kills
these maps after 600 seconds. When retrying the task everything wo
Thank you very much St. Ack.
It sounds like we have to create other filer.
Iulia
On 05/12/2011 08:07 PM, Stack wrote:
On Thu, May 12, 2011 at 6:42 AM, Iulia Zidaru wrote:
Hi,
Thank you for your answer St. Ack.
Yes, both coordinates are the same. It is impossible for the filter to
decide th
On Thu, May 12, 2011 at 6:42 AM, Iulia Zidaru wrote:
> Hi,
>
> Thank you for your answer St. Ack.
> Yes, both coordinates are the same. It is impossible for the filter to
> decide that a value is old. I still don't understand why the HBase server
> has both values or how long does it keep both.
Hi,
Thank you for your answer St. Ack.
Yes, both coordinates are the same. It is impossible for the filter to
decide that a value is old. I still don't understand why the HBase
server has both values or how long does it keep both. The same thing
happens if puts have different timestamps.
Re
On Wed, May 11, 2011 at 2:05 AM, Iulia Zidaru wrote:
> Hi,
> I'll try to rephrase the problem...
> We have a table where we add an empty value.(The same thing happen also if
> we have a value).
> Afterward we put a value inside.(Same put, just other value). When scanning
> for empty values (first
Hi,
I'll try to rephrase the problem...
We have a table where we add an empty value.(The same thing happen also
if we have a value).
Afterward we put a value inside.(Same put, just other value). When
scanning for empty values (first values inserted), the result is wrong
because the filter gets
Hi all,
I want to do a scan on a number of rows, each row having multiple columns, and
I want to filter out some of this columns based on their values per example, if
I have the following rows:
plainRow:col:value1 column=T:19, timestamp=19,
value=
24 matches
Mail list logo