Re: Phoenix for CDH5 compatibility

2016-04-18 Thread Swapna Swapna
Add the phoenix-[version]-server.jar to the classpath of all HBase region server and master and remove any previous version. An easy way to do this is to copy it into the HBase lib directory. On Sun, Apr 17, 2016 at 1:20 AM, Viswanathan J wrote: > What are the required phoenix jars to be copied

Phoenix-Spark: Number of partitions in PhoenixRDD

2016-04-18 Thread Fustes, Diego
Hi all, I'm working with the Phoenix spark plugin to process a HUGE table. The table is salted in 100 buckets and is split in 400 regions. When I read it with phoenixTableAsRDD, I get a RDD with 150 parititions. These partitions are too big, such that I am getting OutOfMemory problems. Therefor

Re: Undefined column. columnName=IS_ROW_TIMESTAMP

2016-04-18 Thread Arun Kumaran Sabtharishi
To add details to the original problem that was mentioned in this email, we migrated to Phoenix-4.6.1 very recently and this problem started occurring only after that. 1. Checking SYSTEM.CATALOG for some older phoenix views in the same environment, some of the *phoenix views did not have the IS_RO

Re: Getting swamped with Phoenix *.tmp files on SELECT.

2016-04-18 Thread marks1900-post01
Maybe I am missing something, though I followed your suggestion and decreased the "phoenix.query.spoolThresholdBytes" value to 10MB, though after some time I am still running out of disk space (350+GB).  Any suggestions? --Wildfly 10 Datasource Declaration--   jdbc:phoenix:server01:/hbase-unsec

Re: Getting swamped with Phoenix *.tmp files on SELECT.

2016-04-18 Thread marks1900-post01
Currently I am running out of disk space as a direct result of these spool temp files (350GB +), any ideas on how to address this?  These .tmp files never seem to be cleaned up after each query.  Is there any work-around? From: Samarth Jain To: "user@phoenix.apache.org" Sent: Friday, 1

Re: Undefined column. columnName=IS_ROW_TIMESTAMP

2016-04-18 Thread Samarth Jain
Arun, Older phoenix views, created pre-4.6, shouldn't have the ROW_TIMESTAMP column. Was the upgrade done correctly i.e. the server jar upgraded before the client jar? Is it possible to get the complete stack trace? Would be great if you could come up with a test case here to understand better whe

Re: Undefined column. columnName=IS_ROW_TIMESTAMP

2016-04-18 Thread James Taylor
Arun, If you're blocked by this, here's a potential way of getting around the issue you're hitting. This assumes you have all the DDL statements that were used to create tables on the cluster: - open an HBase shell and disable and drop the SYSTEM.CATALOG - open sqlline connected to your HBase clus

Re: Undefined column. columnName=IS_ROW_TIMESTAMP

2016-04-18 Thread Arun Kumaran Sabtharishi
James, We cannot afford to follow this workaround, since the names of the phoenix views that already exist is stored in MySql as metadata, which is used by our APIs. We still haven't figured out a way to reproduce this issue as this is only happening in the production environment. Thanks, Arun Sa

Re: Phoenix-Spark: Number of partitions in PhoenixRDD

2016-04-18 Thread Josh Mahonin
Hi Diego, The phoenix-spark RDD partition count is equal to the number of splits that the query planner returns. Adjusting the HBase region splits, table salting [1], as well as the guidepost width [2] should help with the parallelization here. Using 'EXPLAIN' for the generated query in sqlline m

Re: Getting swamped with Phoenix *.tmp files on SELECT.

2016-04-18 Thread Samarth Jain
Marks, FWIW, we had a problem with tmp files left over in case of failures - https://issues.apache.org/jira/browse/PHOENIX-1448. But this has been fixed since 4.2.1 release. To help us, can you post a sample query where you are seeing tmp files left over? Are you sure the application is cleanly cl

Re: Getting swamped with Phoenix *.tmp files on SELECT.

2016-04-18 Thread marks1900-post01
I have narrowed this issue down to select statement below.  When I have iterated through the query results of this select statement, I do ensure that the JDBC close statements on my ResultSet, Statement and Connection are called. For now, I am go with the suggested work-around and implement some

Re: Query by region splits

2016-04-18 Thread James Taylor
Phoenix already does this (and to a finer, configurable granularity). See https://phoenix.apache.org/update_statistics.html Thanks, James On Mon, Apr 18, 2016 at 2:08 PM, Li Gao wrote: > Hi, > > In Phoenix is it possible to query the data by region splits? i.e. if > Table A has 10 regions on th

Query by region splits

2016-04-18 Thread Li Gao
Hi, In Phoenix is it possible to query the data by region splits? i.e. if Table A has 10 regions on the cluster, how I can issue 10 concurrent queries to Table A so that each query covers exactly 1 region for the table? This is helpful for us to split the queries across multiple processor machines

Re: Query by region splits

2016-04-18 Thread Li Gao
Hi James, Thanks for the quick reply. It is helpful but not sure it can solve the issue we have. Let me state use case in another way to make it more obvious. Say Table A has 10 regions spread across 10 HBase nodes, in addition I have 10 data processor machines (not the same as the hbase cluster)

Re: Query by region splits

2016-04-18 Thread James Taylor
Thanks for the clarification, Li. Are you essentially trying to make Phoenix multi-client node? Our idea for that is Drillix [1]. Short term, if you know the split points, you could use our row value constructor syntax [2] to do the above. Thanks, James [1] https://apurtell.s3.amazonaws.com/phoe

Re: Query by region splits

2016-04-18 Thread Li Gao
Hi James. I see, [2] might work for my use case. Thanks, Li On Mon, Apr 18, 2016 at 2:54 PM, James Taylor wrote: > Thanks for the clarification, Li. Are you essentially trying to make > Phoenix multi-client node? Our idea for that is Drillix [1]. Short term, if > you know the split points, yo

Re: Phoenix for CDH5 compatibility

2016-04-18 Thread Swapna Swapna
HI James, I've downloaded the phoenix-for-cloudera-4.6-HBase-1.0-cdh5.4.zip from the below provided link, and using the branch "4.6-Hbase-1.0-cdh5.4" https://github.com/chiastic-security/phoenix-for-cloudera After copying the *phoenix-4.6.0-cdh5.4.5-server.jar *to* /usr/lib/hbase/lib, *still get

Re: [HELP:]Save Spark Dataframe in Phoenix Table

2016-04-18 Thread Divya Gehlot
Hi Josh, I downloaded the Apache Phoenix v4.4.0-HBase-1.1 and tried packaging it through *apache-maven-3.3.9* When I try to build it using maven I am getting following error maven command I used to build Phoenix mvn package -Dskip

Re: apache phoenix json api

2016-04-18 Thread Plamen Paskov
Josh, I removed the quotation but the result is still the same. I still cannot see the new data added neither with prepareAndExecute or prepareAndExecuteBatch On 17.04.2016 22:45, Josh Elser wrote: statementId is an integer, not a string. Remove the quotation marks around the value "2". Pla

Re: apache phoenix json api

2016-04-18 Thread F21
Can you show the requests you are currently sending? This is what a PrepareAndExecute request should look like: https://calcite.apache.org/docs/avatica_json_reference.html#prepareandexecuterequest On 19/04/2016 4:47 PM, Plamen Paskov wrote: Josh, I removed the quotation but the result is still

Re: apache phoenix json api

2016-04-18 Thread Plamen Paskov
The requests are as follow: - open a connection { "request": "openConnection", "connectionId": 5 } - create statement { "request": "createStatement", "connectionId": 5 } - prepare and execute the upsert { "request": "prepareAndExecute", "connectionId": 5, "statementId": 12, "sql

Re: apache phoenix json api

2016-04-18 Thread F21
That looks fine to me! I think phoenix has AutoCommit set to false by default. So, you will need to issue a commit before selecting: https://calcite.apache.org/docs/avatica_json_reference.html#commitrequest Let me know if it works! :) On 19/04/2016 4:54 PM, Plamen Paskov wrote: The requests