About Filter hold private Variable values

2012-01-20 Thread 魏超
Hello, I got a self-defined Filter like this: class FilterA{ *private int count*; ... } and used as: Scan scan = new Scan(...); > scan.addFilter(new FilterA()); > ... > *while(someCondition){ > *ResultScanner scanner = htable.getScanner(scan); > . > *}* * * the value o

Is HBase.Client.Result.getValue(...) and Result.getColumn(...) fetch actual value from TABLE everytime

2012-01-20 Thread Alok Kumar
Hi All, I like to know if HBase.Client.Result.getValue(...) and Result.getColumn(...) fetch actual value from TABLE everytime or is it available in Result/ResultScanner already? -- Alok

strange read delays from hbase

2012-01-20 Thread Pavel Dvorin
Hi! I notice the strange behavior of HBase during reading and writing simultaneously. About my cluster: Master and 16 regionservers, quorum of 3 zookeepers, gigabit ethernet, all nodes are in same subnet. Data is read from a table containing ~100 millions binary records (images). I use Cloudera-c

Re: strange read delays from hbase

2012-01-20 Thread T Vinod Gupta
did you check your gc logs from around the time you are seeing a delay? On Fri, Jan 20, 2012 at 12:54 AM, Pavel Dvorin wrote: > Hi! > > I notice the strange behavior of HBase during reading and writing > simultaneously. > > About my cluster: > Master and 16 regionservers, quorum of 3 zookeepers,

Re: Java api to apply limit in scan

2012-01-20 Thread Doug Meil
Also, if you know you only want 2 rows, for example, make sure the caching is set to 2 so that it only reads that many on the RegionServer. On 1/20/12 12:31 AM, "Harsh J" wrote: >Hi Stuti, > >The way the Shell does it is by iterating over the ResultScanner iterator >for only LIMIT number of

Slides from meetup @ EBay

2012-01-20 Thread Ted Yu
Hi, We had a nice meetup with high quality presentations last night. I have uploaded the slides onto: http://www.meetup.com/hbaseusergroup/files/ Thanks for the presenters for sharing their experiences. Looking forward to HBaseCon 2012.

Fetching of the data from HFile

2012-01-20 Thread Praveen Sripati
Hi, Each HFile has multiple Data Blocks and each block has multiple K/V pairs. So, effectively a given HFile has many K/V pairs. When a client searches for a particular row, is the entire HFile scanned for data or some sort of index is maintained in the HFile? Also, is the data sorted in HFile? T

Re: RegionServer dying every two or three days

2012-01-20 Thread Leonardo Gamas
Thanks Neil for sharing your experience with AWS! Could you tell what instance type are you using? We are using m1.xlarge, that has 4 virtual cores, but i normally see recommendations for machines with 8 cores like c1.xlarge, m2.4xlarge, etc. In principle these 8-core machines don't suffer too much

Hbase out of memory error

2012-01-20 Thread Royston Sellman
Trying to run my code (a test of Aggregation Protocol and an MR HBase table loader) on latest build of 0.92.0 (r1232715) I get an 'old server' warning (I've seen this before and it's always been non-fatal) then an out of memory exception then job hangs: [sshexec] 12/01/20 16:56:48 WARN zookeepe

Re: Slides from meetup @ EBay

2012-01-20 Thread Jean-Daniel Cryans
Thanks to everyone who came and special thanks to eBay for hosting, Ted for organizing and Stack for gluing it all together. Just like releasing new versions, we should have meetups more often! J-D On Fri, Jan 20, 2012 at 7:16 AM, Ted Yu wrote: > Hi, > We had a nice meetup with high quality pre

Re: Slides from meetup @ EBay

2012-01-20 Thread Ted Yu
Definitely. I am hoping this event gives positive feedback to EBay's management. In the future, the hosting process should be more streamlined. Cheers On Fri, Jan 20, 2012 at 9:48 AM, Jean-Daniel Cryans wrote: > Thanks to everyone who came and special thanks to eBay for hosting, > Ted for organ

Regions in Transition?

2012-01-20 Thread Mark
I recently bumped up the region size memory configuration on our HBase cluster and after doing a rolling restart of our 5 nodes I saw the following in the HBase status page: Regions in Transition Region State 32cb0e36cfa326d0a431734ba93a16df items,869239091/es-LA,1323971864141.32cb0e36cf

Re: Hbase out of memory error

2012-01-20 Thread Ted Yu
Royston: I guess you have seen HBASE-5204. In particular: >> when a 0.92 server fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer because it doesn't read fields of 0.90-style RPCs properly. Were your client code compiled with the same version of HBase as what was runnin

HBase map/scan performance

2012-01-20 Thread kfarmer
I'm doing a POC on HBase and wanted to see if someone could verify that my map/scan performance is reasonable. I have one 170 million row table. My cluster setup is 1 master node and 4 slave nodes, all w/ 8GB RM, 1 500GB SATA disk, 1 quad core hyperthreaded CPU. I'm running a MapReduce job ov

Re: RegionServer dying every two or three days

2012-01-20 Thread Matt Corgan
I run c1.xlarge servers and have found them very stable. I see 100 Mbit/s sustained bi-directional network throughput (200Mbit/s total), sometimes up to 150 * 2 Mbit/s. Here's a pretty thorough examination of the underlying hardware: http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardw

Re: 0.92 Max Row Size

2012-01-20 Thread Stack
On Fri, Jan 20, 2012 at 11:43 AM, Wayne wrote: > Does 0.92 support a significant increase in row size over 0.90.x? With > 0.90.4 we have seen writes start choking at 30 million cols/row and reads > start choking at 10 million cols/row. Can we assume these numbers will go > up with .92 and if yes

Re: HBase map/scan performance

2012-01-20 Thread Stack
On Fri, Jan 20, 2012 at 11:36 AM, kfarmer wrote: > This job completes in about 8 minutes. That's 354K rows/second for the > cluster, 88K rows/second for the node, and 22K rows/second (or 22 > rows/millisecond) for each map task. > > Its not too bad? What you need? > Is this performance reas

Re: Regions in Transition?

2012-01-20 Thread Stack
On Fri, Jan 20, 2012 at 10:41 AM, Mark wrote: > I recently bumped up the region size memory configuration on our HBase > cluster and after doing a rolling restart of our 5 nodes I saw the > following in the HBase status page: > > > Regions in Transition > > Region State > 32cb0e36cfa326d0a4317

Re: Fetching of the data from HFile

2012-01-20 Thread Stack
On Fri, Jan 20, 2012 at 7:56 AM, Praveen Sripati wrote: > Hi, > > Each HFile has multiple Data Blocks and each block has multiple K/V pairs. > So, effectively a given HFile has many K/V pairs. When a client searches > for a particular row, is the entire HFile scanned for data or some sort of > ind

HBase schema question

2012-01-20 Thread Amit Gupta
Hi, I am trying to figure out if Hbase is the right candidate for my use case which is as follows : I have a users table containing millions users and for each user I have a bunch of data points for each day in past 2 years. Some of these data points are number of clicks in different parts o

Re: HBase schema question

2012-01-20 Thread T Vinod Gupta
from the little i have used hbase for, it is really good for the below use case you mentioned. hbase takes care of scale and you can use map reduce to do the kind of task you mentioned below. but please remember that it is super important how you design the schema. the schema should allow for your

Re: Fetching of the data from HFile

2012-01-20 Thread Ted Yu
Please also refer to Mikhail's presentation last night (Optimizing_HBase_scanner_performance.pptx): http://www.meetup.com/hbaseusergroup/files/ Cheers On Fri, Jan 20, 2012 at 1:49 PM, Stack wrote: > On Fri, Jan 20, 2012 at 7:56 AM, Praveen Sripati > wrote: > > > Hi, > > > > Each HFile has multi

Re: Is HBase.Client.Result.getValue(...) and Result.getColumn(...) fetch actual value from TABLE everytime

2012-01-20 Thread lars hofhansl
The values are fetched by the operation that returned the Result object and subsequently cached in the Result object. Is that what you were asking? -- Lars From: Alok Kumar To: user@hbase.apache.org Sent: Friday, January 20, 2012 12:48 AM Subject: Is HBase.

Re: Regions in Transition?

2012-01-20 Thread lars hofhansl
Also, you probably do not want to set your blocksize to 512mb. The default is 64k. HBase has to load (either HDFS or cache) and scan this amount of data for each key value lookup. -- Lars From: Stack To: user@hbase.apache.org Sent: Friday, January 20, 2012

Re: Regions in Transition?

2012-01-20 Thread Harsh J
(Just to avoid confusion, Lars is talking about HBase Table CF's blocksize, unrelated to HDFS block sizes, which would be fine if set to 512m.) On 21-Jan-2012, at 9:49 AM, lars hofhansl wrote: > Also, you probably do not want to set your blocksize to 512mb. The default is > 64k. > HBase has to

Hbase 0.90.5 and Hadoop 1.0.0 beta working?

2012-01-20 Thread Invisible.Trust
Hi, i have Debian 6.03 and problem with best friends hbase and hadoop step by step, I want working configuration hbase (standalone for the first step) and hadoop : wget http://www.sai.msu.su/apache//hbase/hbase-0.90.5/hbase-0.90.5.tar.gz tar xzfv hbase-0.90.5.tar.gz sudo mv hbase-0.90.5 /usr/lo

RE: Regions in Transition?

2012-01-20 Thread Ramakrishna s vasudevan
Hi Check the master side logs and in the RS logs check what the RS is doinig w.r.t to the region mentioned in the below trace. Hope you are trying out in 0.90.x and not 0.92? Regards Ram From: Harsh J [ha...@cloudera.com] Sent: Saturday, January 21, 201

Disk Seeks and Column families

2012-01-20 Thread Praveen Sripati
Hi, 1) According to the this url (1), HBase performs well for two or three column families. Why is it so? 2) Dump of a HFile, looks like below. The contents of a row stay together like a regular row-oriented database. If the column family has 100 column family qualifiers and is dense then the dat

Re: HBase schema question

2012-01-20 Thread Amit Gupta
I am not sure how I can do joins using HBase which is essentially what I am trying to do. Based on what I have read it looks like HBase is really good for scans or row key lookup. Please correct me if I am wrong. I can have a HBase table for users with {userid + timestamp} as the rowkey. Using thi

Re: HBase schema question

2012-01-20 Thread Invisible.Trust
I think you need to design your schema with as many tables as many indexes you want. For example: tbl1 {user_id_timestamp} tbl2 {md5(email)} [user_id_timestamp] Also you may be want to look at google "design patterns hbase" Also some examples here : "Oreilly.HBase.The.Definitive.Guide.Aug.2011"