RE: Hbase and Phoenix Performance improvement

2015-07-01 Thread Puneet Kumar Ojha
· Need to know the salt buckets you have created. · Compression Used or Not. · Number Set for returning result ..it should be increased. · Keep deleted cells should be false. If you delete records then it slows the response time. Please provide the query which y

Re: Hbase and Phoenix Performance improvement

2015-07-01 Thread Martin Pernollet
It sounds like you are scanning rather than getting rows based on a known row id. Am I wrong? One thing I am currently trying is to have indexed columns and "hot" content in one column family and let "cold" content in another family. It speed up scanning the table when you need to Le mer. 1 juil.

OR versus IN

2015-07-01 Thread Anirudha Khanna
Hi, I have a simple query like, *SELECT tb1."id" AS "id"* *FROM "ACCOUNTS" tb1* *WHERE tb1."id" = 87* * OR tb1."id" = 89 LIMIT 1* which when running through JAVA using jdbc driver returns me only one row. Whereas there are two rows in the table with those ids. The explain plan of this (fro

Re: Hbase and Phoenix Performance improvement

2015-07-01 Thread Nishant Patel
HI Puneet/Martin, Thanks for your response. Please see my answer as below. I have not specified any salt bucket. I have created Phoenix View on existing Hbase Table. Can I specify Salt bucket for Phoenix View? After loading Hbase data I alter table to use SNAPPY Compression. Are you talking abou

Help interpretting CsvBulkLoader issues?

2015-07-01 Thread Riesland, Zack
After using the CsvBulkLoader successfully for a few days, I’m getting some strange behavior this morning. I ran the job on a fairly small ingest of data (around 1/2 billion rows). It seemed to complete successfully. I see this in the logs: Phoenix MapReduce Import Upser

RE: Hbase and Phoenix Performance improvement

2015-07-01 Thread Puneet Kumar Ojha
Yes …Salting will improve the scan performance. Try with numbers 5,10,20 . As I do not know about the cluster details. Increase scanner caching to 10. Check if SNAPPY is working …I hope you need to put the jars classpath as well. Since the cardinality of the col1 and col2 fields is very sma

RE: Help interpretting CsvBulkLoader issues?

2015-07-01 Thread Riesland, Zack
After some investigation, I think this is a permissions issue. If I run as ‘hbase’, this works consistently. FYI From: Riesland, Zack Sent: Wednesday, July 01, 2015 7:25 AM To: user@phoenix.apache.org Subject: Help interpretting CsvBulkLoader issues? After using the CsvBulkLoader successfully f

EXPLAIN has similar output for filters on indexed and non indexed column

2015-07-01 Thread Martin Pernollet
Hi, I want to perform : SELECT * FROM "table" where "family"."column1" = 'value' Running an EXPLAIN on this request before creating an index on a column gives : CLIENT PARALLEL 1-WAY FULL SCAN OVER table SERVER FILTER BY family.column1 = 'value' Looks OK. Then I simply : CREATE INDEX "table_I

Can't UPSERT into a VIEW?

2015-07-01 Thread Martin Pernollet
Hi, I got an existing HBase table, so I mapped to phoenix using a *view*. I can select, create index, so I am happy. Now I want to add a row (I assume it is not compulsory to have all column values defined - would be boring otherwise with numerous columns): upsert into "table" ("family1"."column

Re: Can't UPSERT into a VIEW?

2015-07-01 Thread Martin Pernollet
I seems - CREATE TABLE returns the error "Table already exists" if you earlier created and dropped a view for the HBase table (bug?). - but one can actually run CREATE TABLE for an existing hbase table according to the documentation I can't be sure for the moment as the CREATE TABLE statement on a

Re: OR versus IN

2015-07-01 Thread James Taylor
Hi Anirudha, No, there's no difference between using sqlline and a Java client (sqlline is a Java client). If you can put together a test with your original OR query and some sample data that exhibits the behavior, please let us know. Thanks, James On Wed, Jul 1, 2015 at 3:40 AM, Anirudha Khanna

Re: Hbase and Phoenix Performance improvement

2015-07-01 Thread James Taylor
Also, try separating your columns into multiple column families to prevent having to scan past your 75+ column qualifiers for every query. On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha wrote: > Yes …Salting will improve the scan performance. Try with numbers 5,10,20 > . As I do not know abo

Re: StackOverflowError

2015-07-01 Thread James Taylor
Baahu, We're having a difficult time reproducing the StackOverflowError you encountered over on PHOENIX-2074. Do you think you could help us reproduce it? Maybe you can upload a test case and/or a CSV file with some sample data that reproduces it? Thanks, James On Tue, Jun 23, 2015 at 12:24 AM, Ba

RE: EXPLAIN has similar output for filters on indexed and non-indexed column

2015-07-01 Thread Gerber, Bryan W
Phoenix doesn’t support automatically using index on a non-covered query. Check out https://phoenix.apache.org/secondary_indexing.html for some examples. Your query may have been faster due to more data being cached. The 2-step plan you’re looking for in this case looks more like this one where

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread Yufan Liu
I have tried to use query "SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1". But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor : > Yes, reverse scan will be leveraged when possible. Make y

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread James Taylor
If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu wrote: > I have tried to use query "SELECT timestamp FROM t1 ORDER BY timestamp > DESC NULLS LAST LIMIT 1". But it still returns the

RE: Query Hints on Functional Index

2015-07-01 Thread Gerber, Bryan W
PHOENIX-2094 created. Covering the queries isn’t a viable option right now, the table is 5TB already and we have multiple indexes where we are trying to optimize case-insensitive queries. Fortunately we can brute-force the plan with the JOIN & subselect to get what we need near-term. Bryan G.

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread Yufan Liu
When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY';

Re: Hbase and Phoenix Performance improvement

2015-07-01 Thread Nishant Patel
Thanks Puneet and James for your responses. Date is not recommended as first part of rowkey. It will create issue during write operation. In real production scenario we will have more data and will have more values for column1 and column2. Will try other things today. Lets see how much I can achi

Re: Hbase and Phoenix Performance improvement

2015-07-01 Thread Anil Gupta
Hi Nishant, Refer to HBase wiki for multiple column families. As per my experience, don't try to have more than 2-3 column family. Also group the column in column families on basis of access pattern. If you don't have an access where you can avoid reading a column family then you would not ga

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread James Taylor
Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu wrote: > When I made more tests, I find that this problem happens after table got > split. > > Here is the DDL I use to create table and index: > CREATE TABLE IF NOT EXISTS t1 ( > uid BIGINT NOT

Re: OR versus IN

2015-07-01 Thread Anirudha Khanna
Hi James, Thanks for your reply, but further debugging pointed out that it was something I was doing wrong on the client side and was able to resolve it. Sorry for the inconvenience. Cheers, Anirudha On Wed, Jul 1, 2015 at 9:45 PM, James Taylor wrote: > Hi Anirudha, > No, there's no difference