Re: Phoenix drop view not working after 4.3.1 upgrade

2015-06-08 Thread James Taylor
That's the lock we take out on the server-side when you drop a view. The timeout is controlled by the hbase.rowlock.wait.duration config parameter. The default is 30 seconds which is already way more time than you'd need to drop the view (which amounts to deleting a handful of rows from the SYSTEM.

AWS AMI version

2015-06-08 Thread William Li
> > Hi does anyone know which AMI version I should choose when creating EMR > cluster in AWS? > > Phoenix.apache.org site said to use 3.0.1 but it is no longer available in > the choice for EMR. I used 3.7.0 but it errors out. > > Thanks, > > William.

Re: Salt bucket count recommendation

2015-06-08 Thread James Taylor
I assume you primarily access your data using a time range query then, so salting on the data tables makes sense. Is this the case for your secondary indexes as well? Do they lead their PK with a date/time column as well? Did you know you can turn salting off for an index over a salted table (by sp

Re: Bulk loading through HFiles

2015-06-08 Thread Dawid
Yes, I did. I also tried to execute some upserts using sqlline after importing HFiles, and rows from upserts are visible both in sqlline and hbase shell, but the rows imported from HFile are only in hbase shell. On 08.06.2015 19:06, James Taylor wrote: Dawid, Perhaps a dumb question, but did

Re: Salt bucket count recommendation

2015-06-08 Thread Perko, Ralph J
James, Thanks for the response. There could be a dozen or so users accessing the system and the same portions of the tables. The motive for salting has been to eliminate hot spotting - our data is time-series based and that is what the PK is based on. Thanks, Ralph On 6/8/15, 10:00 AM

Re: Bulk loading through HFiles

2015-06-08 Thread James Taylor
Dawid, Perhaps a dumb question, but did you execute a CREATE TABLE statement in sqlline for the tables you're importing into? Phoenix needs to be told the schema of the table (i.e. it's not enough to just create the table in HBase). Thanks, James On Mon, Jun 8, 2015 at 10:02 AM, Dawid wrote: > An

Re: Bulk loading through HFiles

2015-06-08 Thread Dawid
Any suggestions? Some clues what to check? On 05.06.2015 23:21, Dawid wrote: Yes I can see it in hbase-shell. Sorry for the bad links, i haven't used private repositories on github. So I moved the files to a gist: https://gist.github.com/dawidwys/3aba8ba618140756da7c Hope this times it will w

Re: Salt bucket count recommendation

2015-06-08 Thread James Taylor
Hi Ralph, What kind of workload do you expect on your cluster? Will there be many users accessing many different parts of your table(s) simultaneously? Have you considered not salting your tables? Or do you have hot spotting issues at write time due to the layout of your PK that salting is preventi

Re: Phoenix drop view not working after 4.3.1 upgrade

2015-06-08 Thread Arun Kumaran Sabtharishi
Hello James, Thanks for the reply. Here are the answers for the questions you have asked. *1.) What's different between the two environments (i.e. the working andnot working ones)?* The not working ones has more number of views than the working ones. *2.) Do you mean 1.3M views or 1.3M rows?*

Re: Schema and indexes for efficient time range queries

2015-06-08 Thread James Taylor
Both DATE and TIME have millisecond granularity (both are stored as 8 byte longs), so I'd recommend using either of those. Phoenix also supports date arithmetic, so you can do queries like this to get the last weeks worth of data: SELECT * FROM SENSOR_DATA WHERE sid = 'ID1' AND dt > CURRENT_TIME()

Re: Salt bucket count recommendation

2015-06-08 Thread Perko, Ralph J
Hi – following up on this. Is it generally recommended to roughly match the salt bucket count to region server count? Or is it more arbitrary? Should I use something like 255 because the regions are going to split anyway? Thanks, Ralph From: "Perko, Ralph J" Reply-To: "user@phoenix.apache.o

Re: Schema and indexes for efficient time range queries

2015-06-08 Thread Vladimir Rodionov
There are several time data types, natively supported by Phoenix: TIME is probably most suitable for your case (it should have millisecond accuracy, but you better check it yourself.) -Vlad On Mon, Jun 8, 2015 at 9:02 AM, Yiannis Gkoufas wrote: > Hi Vladimir, > > thanks a lot for your input, ju

Re: Schema and indexes for efficient time range queries

2015-06-08 Thread Yiannis Gkoufas
Hi Vladimir, thanks a lot for your input, just some followup questions: 1) When you say "try to fit it in long" you mean UNSIGNED_LONG from https://phoenix.apache.org/language/datatypes.html right? 2) Would also string format be efficient? Like MMDDHHmm right? Thanks a lot! On 8 June 2015 at

Re: Schema and indexes for efficient time range queries

2015-06-08 Thread Vladimir Rodionov
PRIMARY KEY(dt,sid)) won't work well for your query. PRIMARY KEY(sid, dt)) is much better for time range queries for a particular sensor. In a latter case this query will be translated into efficient range scan. Do not use bigint for timestamp, try to fit it into long or use stringified version in

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-08 Thread Josh Mahonin
Hi Jeroen, Have you tried using the phoenix-client uber JAR in the Spark classpath? That strategy I think is the simplest and most straight-forward, although it may not be appropriate for all projects. With your setup though, my guess is that Spark is preferring to use its own versions of Hadoop

Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-08 Thread Jeroen Vlek
Hi, I posted a question with regards to Phoenix and Spark Streaming on StackOverflow [1] and realized that I might have more luck trying it on here. I copied the complete question to this email as well (see below) If you guys deem it necessary, I can also try my luck on the Spark mailing list.

Schema and indexes for efficient time range queries

2015-06-08 Thread Yiannis Gkoufas
Hi there, I am investigating Phoenix as a potential data-store for time-series data on sensors. What I am really interested in, as a first milestone, is to have efficient time range queries for a particular sensor. From those queries the results would consist of 1 or 2 columns (so small rows). I w