Stargate perf and troubleshooting tips

2014-08-06 Thread SiMaYunRui
Hi all, I am encountering a performance when building an application to scan data thru Stargate (HBase Restful service). Stargate asks two steps to fetch data from hbase table, the first step is to put a scanner resource and then to get next thru another http request. My observation is that it

Re: Guava version incompatible

2014-08-06 Thread Deepa Jayaveer
Is there any tutorials available in the net to connect Spark Java API with HBase ? Thanks and Regards Deepa From: "Dai, Kevin" To: "user@hbase.apache.org" Date: 08/07/2014 11:11 AM Subject:Guava version incompatible Hi, all I am now using spark to manipulate hbase. But

Guava version incompatible

2014-08-06 Thread Dai, Kevin
Hi, all I am now using spark to manipulate hbase. But I cant't use HBaseTestingUtility to do unit test. Because spark needs Guava 15.0 and above while Hbase needs Guava 14.0.1. These two versions are incompatible. Is there any way to solve this conflict with maven. Thanks, Kevin.

Re: Question on the number of column families

2014-08-06 Thread Qiang Tian
Hi TaeYun, thanks for explain. On Thu, Aug 7, 2014 at 12:50 PM, innowireless TaeYun Kim < taeyun@innowireless.co.kr> wrote: > Hi Qiang, > thank you for your help. > > 1. Regarding HBASE-5416, I think it's purpose is simple. > > "Avoid loading column families that is irrelevant to filtering

RE: What is in a HBase block index entry?

2014-08-06 Thread innowireless TaeYun Kim
Thank you Anoop. Though it's a bit strange to include CF in the index, since all the block index is contained in a HFile for a specific CF, I'm sure there would be a good reason (maybe for the performance of the comparison). Anyways it should be almost no issue since the length of the CF should

Re: Question on the number of column families

2014-08-06 Thread Ted Yu
bq. no built-in filter intelligently determines which column family is essential, except for SingleColumnValueFilter Mostly right - don't forget about SingleColumnValueExcludeFilter which extends SingleColumnValueFilter. Cheers On Wed, Aug 6, 2014 at 9:34 PM, innowireless TaeYun Kim < taeyun...

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Hi Qiang, thank you for your help. 1. Regarding HBASE-5416, I think it's purpose is simple. "Avoid loading column families that is irrelevant to filtering while scanning." So, it can be applied to my 'dummy CF' case. That is, a dummy CF can act like an 'relevant' CF to filtering, provided that H

Re: What is in a HBase block index entry?

2014-08-06 Thread Anoop John
It will be the key of the KeyValue. Key includes rk + cf + qualifier + ts + type. So all these part of key. Your annswer#1 is correct (but with addition of type also).. Hope this make it clear for you. -Anoop- On Tue, Aug 5, 2014 at 9:43 AM, innowireless TaeYun Kim < taeyun@innowireless.c

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Thank you Ted. But RowFilter class has no method that can be uses to set which column family is essential. (Actually no built-in filter class provides such a method) So, if I (ever) want to apply the 'dummy' column family technique(?), it seems that I must do as follows: - Write my own filter

Re: hbase attack scenarios?

2014-08-06 Thread Wilm Schumacher
Am 06.08.2014 um 19:07 schrieb Andrew Purtell: > We have no known vulnerabilities that equate to a SQL injection attack > vulnerability. However, as Esteban says you'd want to treat HBase like any > other datastore underpinning a production service and out of an abundance > of caution deploy it i

Re: Question on the number of column families

2014-08-06 Thread Ted Yu
bq. While scanning, an entire row will be read even for a rowkey filtering If you specify essential column family in your filter, the above would not be true - only the essential column family would be loaded into memory first. Once the filter passes, the other family would be loaded. Cheers On

Re: Question on the number of column families

2014-08-06 Thread Qiang Tian
Hi, the description of hbase-5416 stated why it was introduced, if you only have 1 CF, dummy CF does not help. it is helpful for multi-CF case, e.g. "putting them in one column family. And "Non frequently" ones in another. " bq. "Field name will be included in rowkey." Please read the chapter 9 "A

RE: Why hbase need manual split?

2014-08-06 Thread Rendon, Carlos (KBB)
You are just starting up a service and want the load split between multiple region servers from the start, instead of waiting for the manual splitting. Say you had 5 region servers, one way to create your table via HBase shell is like this create 'tablename', 'f', {NUMREGIONS => 5, SPLITALGO =>

Re: hbase attack scenarios?

2014-08-06 Thread Andrew Purtell
We have no known vulnerabilities that equate to a SQL injection attack vulnerability. However, as Esteban says you'd want to treat HBase like any other datastore underpinning a production service and out of an abundance of caution deploy it into a secure enclave behind an internal service API, so r

Re: hbase memstore size

2014-08-06 Thread Ted Yu
bq. HBase will first check if the data exist in memstore, if not, it will check the disk For read path, don't forget block cache / bucket cache. Cheers On Wed, Aug 6, 2014 at 7:54 AM, yonghu wrote: > I did not quite understand your problem. You store your data in HBase, and > I guess later yo

Re: hbase memstore size

2014-08-06 Thread yonghu
I did not quite understand your problem. You store your data in HBase, and I guess later you also will read data from it. Generally, HBase will first check if the data exist in memstore, if not, it will check the disk. If you set the memstore to 0, it denotes every read will directly forward to dis

Problem starting HBase0.98/Hadoop2 minicluster : Metrics source RetryCache/NameNodeRetryCache already exists!

2014-08-06 Thread anil gupta
Hi All, I am trying to run JUnit for SortingCoprocessor(HBase-7474) in HBase0.98. I am getting this error: *14/08/06 07:06:09 ERROR namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.metrics2.MetricsException: Metrics source RetryCache/NameNodeRetryCache already exists!*

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Hi Ted, Now I finished reading the filtering section and the source code of TestJoinedScanners(0.94). Facts learned: - While scanning, an entire row will be read even for a rowkey filtering. (Since a rowkey is not a physically separate entity and stored in KeyValue object, it's natural. Am I

RE: Why hbase need manual split?

2014-08-06 Thread Liu, Ming (HPIT-GADSC)
Thanks John, This is a very good answer, now I understand why you use manual split, thanks. And I have a typo in my previous post, The C is very close to A not to B-A/2. So every split in middle of key range will result a big region and a small region. So very bad. So HBase only do auto split

Re: Why hbase need manual split?

2014-08-06 Thread john guthrie
to be honest, we were doing manual splits for the main reason that we wanted to make sure it was done on our schedule. but it also occurred to me that the automatic splits, at least by default, split the region in half. normally the idea is that both new halves continue to grow, but with a sequent

RE: Why hbase need manual split?

2014-08-06 Thread Liu, Ming (HPIT-GADSC)
Thanks Arun, and John, Both of your scenarios make a lot of sense to me. But for the "sequence-based key" case, I am still confused. It is like an append-only operation, so new data are always written into the same region, but that region will eventually reach the hbase.hregion.max.filesize and

Re: Why hbase need manual split?

2014-08-06 Thread john guthrie
i had a customer with a sequence-based key (yes, he knew all the downsides for that). being able to split manually meant he could split a region that got too big at the end vice right down the middle. with a sequentially increasing key, splitting the region in half left one region half the desired