Re: Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
Yes, I just want to limit the scan with the table by the table row key. And I don't pass the startRow and endRow in GetList method. I just pass the startRow and endRow to coprocessorExec method. My Code is as below: > results = table.coprocessorExec(IEndPoint_SA.class, startrow, endrow,

Re: RefGuide schema design examples

2013-04-19 Thread Viral Bajaria
+1! On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda < marcosluis2...@gmail.com> wrote: > Wow, great work, Doug. > > > 2013/4/19 Doug Meil > > > Hi folks, > > > > I reorganized the Schema Design case studies 2 weeks ago and consolidated > > them into here, plus added several cases c

Re: Overwrite a row

2013-04-19 Thread Ted Yu
I don't know details about Kristoffer's schema. If all the column qualifiers are known a priori, mutateRow() should serve his needs. HBase allows arbitrary number of columns in a column family. If the schema is dynamic, mutateRow() wouldn't suffice. If the column qualifiers are known but the row i

Re: Overwrite a row

2013-04-19 Thread Mohamed Ibrahim
Hello Kristoffer, HBase row mutations are atomic ( http://hbase.apache.org/acid-semantics.html ), which include put . So when you overwrite a row it is not possible for another processes to read half old / half new data. They will either read all old or all new data if the put succeeds. It is also

Re: Overwrite a row

2013-04-19 Thread Mohamed Ibrahim
It seems that 0.95 is not released yet, mutateRow won't be a solution for now. I saw it in the downloads and I thought it was released. On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim wrote: > Just noticed you want to delete as well. I think that's supported since > 0.95 in mutateRow ( > http:/

Re: Overwrite a row

2013-04-19 Thread Mohamed Ibrahim
Actually I do see it in the 0.94 JavaDocs ( http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations) ), so may be it was added in 0.94.6 even though the jira says fixed in 0.95 . I haven't used it though, but it seems tha

Re: RefGuide schema design examples

2013-04-19 Thread Marcos Luis Ortiz Valmaseda
Wow, great work, Doug. 2013/4/19 Doug Meil > Hi folks, > > I reorganized the Schema Design case studies 2 weeks ago and consolidated > them into here, plus added several cases common on the dist-list. > > http://hbase.apache.org/book.html#schema.casestudies > > Comments/suggestions welcome. Th

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
Thanks a lot. Best Regards / 商祺 郭伟 Guo Wei > Please upgrade to 0.94.6.1 which is more stable. > > Cheers > > On Apr 19, 2013, at 4:58 AM, GuoWei wrote: > >> >> We use base 0.94.1 in our production environment. >> >> >> Best Regards / 商祺 >> 郭伟 Guo Wei >> >> 在 2013-4-19,下午6:01,Ted Yu 写

RefGuide schema design examples

2013-04-19 Thread Doug Meil
Hi folks, I reorganized the Schema Design case studies 2 weeks ago and consolidated them into here, plus added several cases common on the dist-list. http://hbase.apache.org/book.html#schema.casestudies Comments/suggestions welcome. Thanks! Doug Meil Chief Software Architect, Explorys doug.m

Re: Overwrite a row

2013-04-19 Thread Ted Yu
If the maximum number of versions is set to 1 for your table, you would already have what you wanted. Normally max versions being 1 is not desired, that was why I asked about your use case. Cheers On Fri, Apr 19, 2013 at 12:44 PM, Kristoffer Sjögren wrote: > What would you suggest? I want the o

Slow region server recoveries due to lease recovery going to stale data node

2013-04-19 Thread Ted Yu
I think the issue would be more appropriate for hdfs-dev@ mailing list. Putting use@hbase as Bcc. -- Forwarded message -- From: Varun Sharma Date: Fri, Apr 19, 2013 at 1:10 PM Subject: Re: Slow region server recoveries To: user@hbase.apache.org This is 0.94.3 hbase... On Fri,

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
This is 0.94.3 hbase... On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma wrote: > Hi Ted, > > I had a long offline discussion with nicholas on this. Looks like the last > block which was still being written too, took an enormous time to recover. > Here's what happened. > a) Master split tasks and

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
Hi Ted, I had a long offline discussion with nicholas on this. Looks like the last block which was still being written too, took an enormous time to recover. Here's what happened. a) Master split tasks and region servers process them b) Region server tries to recover lease for each WAL log - most

Re: Inconsistent performance numbers with increased nodes

2013-04-19 Thread Marcos Luis Ortiz Valmaseda
Just a question, Alex. Why are you using OpenJDK? The first recommendation for a Hadoop cluster is to use Java SDK from Oracle , because precisely with OpenJDK, there are some performance issues, which should be fixed in the next releases, but I encourage you to use Java 1.6. from Oracle. - Which

Re: Overwrite a row

2013-04-19 Thread Kristoffer Sjögren
What would you suggest? I want the operation to be atomic. On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu wrote: > What is the maximum number of versions do you allow for the underlying > table ? > > Thanks > > On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren >wrote: > > > Hi > > > > Is it possib

Re: Overwrite a row

2013-04-19 Thread Ted Yu
What is the maximum number of versions do you allow for the underlying table ? Thanks On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren wrote: > Hi > > Is it possible to completely overwrite/replace a row in a single _atomic_ > action? Already existing columns and qualifiers should be removed

Re: zookeeper taking 15GB RAM

2013-04-19 Thread Rohit Kelkar
Thanks for the reply. I checked the default max heap size for java on the nodes and it turns out its 16G. So now I have to start zookeeper with a reasonable value for heapsize. What are the factors that would impact the heap size of zookeeper? Is it more tables in hbase or is is the number of regio

Re: Speeding up the row count

2013-04-19 Thread lars hofhansl
You should expect to be able to scan about 1-2m small rows/s/core if everything is in cache. Something is definitely wrong in your setup. Can you post your config files (HBase and HDFS) via pastebin? -- Lars From: Omkar Joshi To: "user@hbase.apache.org" S

Overwrite a row

2013-04-19 Thread Kristoffer Sjögren
Hi Is it possible to completely overwrite/replace a row in a single _atomic_ action? Already existing columns and qualifiers should be removed if they do not exist in the data inserted into the row. The only way to do this is to first delete the row then insert new data in its place, correct? Or

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
here is the snippet 2013-04-19 00:27:38,337 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-696828882-10.168.7.226-1364886167971:blk_40107897639761277_174072 2013-04-19 00:27:38,337 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDataset

Re: Slow region server recoveries

2013-04-19 Thread Ted Yu
Can you show snippet from DN log which mentioned UNDER_RECOVERY ? Here is the criteria for stale node checking to kick in (from https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch ): + * Check if the datanode is in stale state. Here if + * the namenode ha

Re: Starting Region Server with HBase API

2013-04-19 Thread Ted Yu
Can you tell us a bit more about your requirement ? Looks like you want to control the cluster size (number of region servers in particular). Once the requirement is outlined, we can think of formal way to address it. Thanks On Fri, Apr 19, 2013 at 9:34 AM, Mehmet Simsek wrote: > Security isn't

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
Is there a place to upload these logs ? On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma wrote: > Hi Nicholas, > > Attached are the namenode, dn logs (of one of the healthy replicas of the > WAL block) and the rs logs which got stuch doing the log split. Action > begins at 2013-04-19 00:27*. > >

Re: Starting Region Server with HBase API

2013-04-19 Thread Mehmet Simsek
Security isn't necessary for us.We want to start region server in java application. How can we do? Mehmet Şimşek On 19 Nis 2013, at 19:08, Ted Yu wrote: > By java application I assume it is an HBase client. > Is your HBase cluster secure ? > > How you thought about security implication of al

Re: zookeeper taking 15GB RAM

2013-04-19 Thread Arpit Gupta
Take a look at this https://issues.apache.org/jira/browse/ZOOKEEPER-1670 When no xmx was set we noticed that zookeeper could take upto 1/4 of the memory available on the system with jdk 1.6 -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Apr 19, 2013, at 9:15 AM, Rohit Kelkar wrote:

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Gary Helmling
As others mention HBASE-6870 is about coprocessorExec() always scanning the full .META. table to determine region locations. Is this what you mean or are you talking about your coprocessor always scanning your full user table? If you want to limit the scan within regions in your user table, you'l

Re: zookeeper taking 15GB RAM

2013-04-19 Thread Rohit Kelkar
Hi, any inputs on this issue? Is there some periodic cleanup that we need to do? - Rohit Kelkar On Thu, Apr 18, 2013 at 10:33 AM, Rohit Kelkar wrote: > No. Just using the "bin/zkServer.sh start" command. Also each node has 48 > GB RAM > > - Rohit Kelkar > > > On Thu, Apr 18, 2013 at 10:28 AM, J

Re: Starting Region Server with HBase API

2013-04-19 Thread Ted Yu
By java application I assume it is an HBase client. Is your HBase cluster secure ? How you thought about security implication of allowing client app to start region server ? Cheers On Fri, Apr 19, 2013 at 7:52 AM, Mehmet Simsek wrote: > Can I use script to start region server in java applicatio

Re: Speeding up the row count

2013-04-19 Thread James Taylor
Phoenix will parallelize within a region: SELECT count(1) FROM orders I agree with Ted, though, even serially, 100,000 rows shouldn't take any where near 6 mins. You say > 100,000 rows. Can you tell us what it's < ? Thanks, James On Apr 19, 2013, at 2:37 AM, "Ted Yu" wrote: > Since there is

Re: Starting Region Server with HBase API

2013-04-19 Thread Mehmet Simsek
Can I use script to start region server in java application starting in windows platform?

Re: Speeding up the row count

2013-04-19 Thread Ted Yu
The stack trace was from your HBase client. Can you check server log ? Thanks On Apr 19, 2013, at 2:55 AM, Omkar Joshi wrote: > Hi Ted, > > 6 minutes is too long :( > Will this decrease to seconds if more nodes are added in the cluster? > > I got this exception finally(I recall faintly abou

Re: Starting Region Server with HBase API

2013-04-19 Thread Ted Yu
startRegionServer creates a new Thread, wrapping the passed in HRegionServer. Can you use script to start region server ? Cheers On Fri, Apr 19, 2013 at 5:15 AM, Mehmet Simsek wrote: > Hi, I can stop region server by using HBaseAdmin class but cannot start. > > How can I start region server by

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Ted Yu
Please upgrade to 0.94.6.1 which is more stable. Cheers On Apr 19, 2013, at 4:58 AM, GuoWei wrote: > > We use base 0.94.1 in our production environment. > > > Best Regards / 商祺 > 郭伟 Guo Wei > > 在 2013-4-19,下午6:01,Ted Yu 写道: > >> Which hbase version are you using ? >> >> Thanks >> >> On

Starting Region Server with HBase API

2013-04-19 Thread Mehmet Simsek
Hi, I can stop region server by using HBaseAdmin class but cannot start. How can I start region server by using Hbase API? HRegionServer class has startRegionServer method, Can I use this class? -- M. Nurettin ŞİMŞEK

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Jean-Marc Spaggiari
Then https://issues.apache.org/jira/browse/HBASE-6870 is most probably impacting you. Take a look at the link. It's not yet fixed but it's coming. You might want to upgrade to a release which will include this fix. JM 2013/4/19 GuoWei > > We use base 0.94.1 in our production environment. > > >

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
We use base 0.94.1 in our production environment. Best Regards / 商祺 郭伟 Guo Wei 在 2013-4-19,下午6:01,Ted Yu 写道: > Which hbase version are you using ? > > Thanks > > On Apr 19, 2013, at 2:49 AM, GuoWei wrote: > >> Hello, >> >> We use HBase core processor endpoint to process realtime data. B

Re: Slow region server recoveries

2013-04-19 Thread Nicolas Liochon
Thanks for the detailed scenario and analysis. I'm going to have a look. I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com timeouts), could you please send them directly to me? Thanks, Nicolas On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma wrote: > Hi Nicholas, > > Here is th

Re: Slow region server recoveries

2013-04-19 Thread Varun Sharma
Hi Nicholas, Here is the failure scenario, I have dug up the logs. A machine fails and stops accepting/transmitting traffic. The HMaster starts the distributed split for 13 tasks. There are 12 region servers. 12 tasks succeed but the 13th one takes a looong time. Zookeeper timeout is set to 30 s

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Ted Yu
Which hbase version are you using ? Thanks On Apr 19, 2013, at 2:49 AM, GuoWei wrote: > Hello, > > We use HBase core processor endpoint to process realtime data. But when I > use coreprocessorExec method to scan table and pass startRow and endRow. It > always scan all table instead of the r

Re: Coreprocessor always scan the whole table.

2013-04-19 Thread ramkrishna vasudevan
HBASE-6870 deals with it. It is not yet committed. We can review the patch and take it to closure. Regards Ram On Fri, Apr 19, 2013 at 3:19 PM, GuoWei wrote: > Hello, > > We use HBase core processor endpoint to process realtime data. But when I > use coreprocessorExec method to scan table an

RE: Problem in filters

2013-04-19 Thread Omkar Joshi
Hi, There was small issue with the data(delimiters were messed up) - the filters seem to work correctly. I'm now working on Hive+HBase integration, Phoenix will be taken up later. Regards, Omkar Joshi -Original Message- From: Ian Varley [mailto:ivar...@salesforce.com] Sent: Wednesday,

RE: Speeding up the row count

2013-04-19 Thread Omkar Joshi
Hi Ted, 6 minutes is too long :( Will this decrease to seconds if more nodes are added in the cluster? I got this exception finally(I recall faintly about increasing some timeout parameter while querying but I didn't want to increase it to a high value) : Apr 19, 2013 1:05:43 PM org.apache.had

Coreprocessor always scan the whole table.

2013-04-19 Thread GuoWei
Hello, We use HBase core processor endpoint to process realtime data. But when I use coreprocessorExec method to scan table and pass startRow and endRow. It always scan all table instead of the result between the startRow and endRow. my code. results = table.coprocessorExec(IEndPoint_SA.clas

Re: Speeding up the row count

2013-04-19 Thread Ted Yu
Since there is only one region in your table, using aggregation coprocessor has no advantage. I think there may be some issue with your cluster - row count should finish within 6 minutes. Have you checked server logs ? Thanks On Apr 19, 2013, at 12:33 AM, Omkar Joshi wrote: > Hi, > > I'm h

Re: Slow region server recoveries

2013-04-19 Thread Nicolas Liochon
Hey Varun, Could you please share the logs and the configuration (hdfs / hbase settings + cluster description). What's the failure scenario? >From an HDFS pov, HDFS 3703 does not change the dead node status. But these node will be given the lowest priority when reading. Cheers, Nicolas On Fri

RE: Speeding up the row count

2013-04-19 Thread Omkar Joshi
Hi, I'm having a 2-node(VMs) Hadoop cluster atop which HBase is running in the distributed mode. I'm having a table named ORDERS with >10 rows. NOTE : Since my cluster is ultra-small, I didn't pre-split the table. ORDERS rowkey :ORDER_ID column family : ORDER_DETAILS