Re: Scan problem

2018-03-21 Thread Yang Zhang
Thanks all of you, and your answer help me a lot. 2018-03-19 22:31 GMT+08:00 Saad Mufti : > Another option if you have enough disk space/off heap memory space is to > enable bucket cache to cache even more of your data, and set the > PREFETCH_ON_OPEN => true option on the column families you wan

Re: Scan problem

2018-03-19 Thread Saad Mufti
Another option if you have enough disk space/off heap memory space is to enable bucket cache to cache even more of your data, and set the PREFETCH_ON_OPEN => true option on the column families you want always cache. That way HBase will prefetch your data into the bucket cache and your scan won't ha

Re: Scan problem

2018-03-18 Thread ramkrishna vasudevan
Hi First regarding the scans, Generally the data resides in the store files which is in HDFS. So probably the first scan that you are doing is reading from HDFS which involves disk reads. Once the blocks are read, they are cached in the Block cache of HBase. So your further reads go through that

Scan problem

2018-03-17 Thread Yang Zhang
Hello everyone I try to do many Scan use RegionScanner in coprocessor, and ervery time ,the first Scan cost about 10 times than the other, I don't know why this will happen OneBucket Scan cost is : 8794 ms Num is : 710 OneBucket Scan cost is : 91 ms Num is : 776 OneBucket Scan cost is :

Re: M/R scan problem

2011-07-04 Thread Michel Segel
Did a quick trim... Sorry to jump in on the tail end of this... Two things you may want to look at... Are you timing out because you haven't updated your status within the task or are you taking 600seconds to complete a single map() iteration. You can test this by tracking to see how long you a

Re: M/R scan problem

2011-07-04 Thread Ted Yu
Although connection count may not be the root cause, please read http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif you have time. 0.92.0 would do a much better job of managing connections. On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter wrote: > I will increase t

Re: M/R scan problem

2011-07-04 Thread Lior Schachter
I will increase the number of connections to 1000. Thanks ! Lior On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu wrote: > The reason I asked about HBaseURLsDaysAggregator.java was that I see no > HBase (client) code in call stack. > I have little clue for the problem you experienced. > > There may b

Re: M/R scan problem

2011-07-04 Thread Ted Yu
>From master UI, click 'zk dump' :60010/zk.jsp would show you the active connections. See if the count reaches 300 when map tasks run. On Mon, Jul 4, 2011 at 10:12 AM, Ted Yu wrote: > The reason I asked about HBaseURLsDaysAggregator.java was that I see no > HBase (client) code in call stack. > I

Re: M/R scan problem

2011-07-04 Thread Ted Yu
The reason I asked about HBaseURLsDaysAggregator.java was that I see no HBase (client) code in call stack. I have little clue for the problem you experienced. There may be more than one connection to zookeeper from one map task. So it doesn't hurt if you increase hbase.zookeeper.property.maxClient

Re: M/R scan problem

2011-07-04 Thread Lior Schachter
1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are not important since even when I removed all my map code the tasks got stuck (but the thread dumps were generated after I revived the code). If you think its important I'll remove the map code again and re-generate the threa

Re: M/R scan problem

2011-07-04 Thread Ted Yu
In the future, provide full dump using pastebin.com Write snippet of log in email. Can you tell us what the following lines are about ? HBaseURLsDaysAggregator.java:124 HBaseURLsDaysAggregator.java:131 How many mappers were launched ? What value is used for hbase.zookeeper.property.maxClientCnxn

Re: M/R scan problem

2011-07-04 Thread Lior Schachter
I used kill -3, following the thread dump: Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode): "IPC Client (47) connection to /127.0.0.1:59759 from hadoop" daemon prio=10 tid=0x2aaab05ca800 nid=0x4eaf in Object.wait() [0x403c1000] java.lang.Thread.State: TIMED

Re: M/R scan problem

2011-07-04 Thread Ted Yu
I wasn't clear in my previous email. It was not answer to why map tasks got stuck. TableInputFormatBase.getSplits() is being called already. Can you try getting jstack of one of the map tasks before task tracker kills it ? Thanks On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter wrote: > 1. Curre

Re: M/R scan problem

2011-07-04 Thread Lior Schachter
1. Currently every map gets one region. So I don't understand what difference will it make using the splits. 2. How should I use the TableInputFormatBase.getSplits() ? Could not find examples for that. Thanks, Lior On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu wrote: > For #2, see TableInputFormatBas

Re: M/R scan problem

2011-07-04 Thread Ted Yu
For #2, see TableInputFormatBase.getSplits(): * Calculates the splits that will serve as input for the map tasks. The * number of splits matches the number of regions in a table. On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter wrote: > 1. yes - I configure my job using this line: > TableMa

Re: M/R scan problem

2011-07-04 Thread Lior Schachter
1. yes - I configure my job using this line: TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan, ScanMapper.class, Text.class, MapWritable.class, job) which internally uses TableInputFormat.class 2. One split per region ? What do you mean ? How do I do that ? 3. hbase versio

Re: M/R scan problem

2011-07-04 Thread Ted Yu
Do you use TableInputFormat ? To scan large number of rows, it would be better to produce one Split per region. What HBase version do you use ? Do you find any exception in master / region server logs around the moment of timeout ? Cheers On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter wrote: >

M/R scan problem

2011-07-04 Thread Lior Schachter
Hi all, I'm running a scan using the M/R framework. My table contains hundreds of millions of rows and I'm scanning using start/stop key about 50 million rows. The problem is that some map tasks get stuck and the task manager kills these maps after 600 seconds. When retrying the task everything wo

Re: HBase filtered scan problem

2011-05-23 Thread Iulia Zidaru
Thank you very much St. Ack. It sounds like we have to create other filer. Iulia On 05/12/2011 08:07 PM, Stack wrote: On Thu, May 12, 2011 at 6:42 AM, Iulia Zidaru wrote: Hi, Thank you for your answer St. Ack. Yes, both coordinates are the same. It is impossible for the filter to decide th

Re: HBase filtered scan problem

2011-05-12 Thread Stack
On Thu, May 12, 2011 at 6:42 AM, Iulia Zidaru wrote: >  Hi, > > Thank you for your answer St. Ack. > Yes, both coordinates are the same. It is impossible for the filter to > decide that a value is old. I still don't understand why the HBase server > has both values or how long does it keep both.

Re: HBase filtered scan problem

2011-05-12 Thread Iulia Zidaru
Hi, Thank you for your answer St. Ack. Yes, both coordinates are the same. It is impossible for the filter to decide that a value is old. I still don't understand why the HBase server has both values or how long does it keep both. The same thing happens if puts have different timestamps. Re

Re: HBase filtered scan problem

2011-05-11 Thread Stack
On Wed, May 11, 2011 at 2:05 AM, Iulia Zidaru wrote: >  Hi, > I'll try to rephrase the problem... > We have a table where we add an empty value.(The same thing happen also if > we have a value). > Afterward we put a value inside.(Same put, just other value). When scanning > for empty values (first

Re: HBase filtered scan problem

2011-05-11 Thread Iulia Zidaru
Hi, I'll try to rephrase the problem... We have a table where we add an empty value.(The same thing happen also if we have a value). Afterward we put a value inside.(Same put, just other value). When scanning for empty values (first values inserted), the result is wrong because the filter gets

HBase filtered scan problem

2011-05-10 Thread Stefan Comanita
Hi all, I want to do a scan on a number of rows, each row having multiple columns, and I want to filter out some of this columns based on their values per example, if I have the following rows: plainRow:col:value1 column=T:19, timestamp=19, value=