Thanks, Vladimir - that's a very nice writeup. Would you mind adding it as a comment on https://issues.apache.org/jira/browse/PHOENIX-590? Also, would you be interested in contributing this functionality? Regards, James
On Sun, Apr 6, 2014 at 11:10 AM, Vladimir Rodionov <[email protected]>wrote: > James, > > for many applications one of the column of a table can be (and must be) > naturally mapped > > to HBase timestamp. What it gives us is the optimization on StoreScanner > where HFiles with timestamps out of range of > > a Scan operator will be omitted. Let us say that we have time-series type of > data (EVENTS) and custom compaction, where we create > series of HFiles with continuous non-overlapping timestamp ranges. > > CREATE TABLE IF NOT EXISTS ODS.EVENTS ( > METRICID VARCHAR NOT NULL, > METRICNAME VARCHAR, > SERVICENAME VARCHAR NOT NULL, > ORIGIN VARCHAR NOT NULL, > APPID VARCHAR, > IPID VARCHAR, > NVALUE DOUBLE, > TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/, > DATA VARCHAR, > SVALUE VARCHAR > CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, > TIME) > ) SALT_BUCKETS=40, > IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE'; > > Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to > Phoenix that the column > > TIME must be mapped to HBase timestamp. > > The Query: > > Select all events of type 'X' for last 7 days > > SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - > 7*24*3600000; (this may be not correct SQL syntax of course) > > These types of queries will be efficiently optimized if: > > 1. Phoenix maps TIME column to HBase timestamp > > 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan > timerange > > > Although this : > > Properties props = new Properties(); > props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); > Connection conn = DriverManager.connect(myUrl, props); > > conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')"); > conn.commit(); > > will work in my case- it may not be efficient from performance point of view > because for every INSERT/UPSERT > new Connection object and new Statement is created, beside this we still need > the optimization 2. (see above). > > > -Vladimir > > > > > > > > > > On Sun, Apr 6, 2014 at 8:45 AM, James Taylor <[email protected]>wrote: > >> Hi Vladimir, >> By default, Phoenix queries show you the "latest" data. You can override >> this at connection time and do "flashback" queries, DDL, and DML as >> described here: >> http://phoenix.incubator.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API >> >> What ever optimizations HBase does, Phoenix will see too, since at the >> end of the day, we're issuing a regular set of scans. >> >> Is this the kind of functionality for which you're looking? Or is it more >> along the lines of a query returning multiple versions of the same row: >> https://issues.apache.org/jira/browse/PHOENIX-590? This isn't supported >> yet. >> >> Or maybe something else? >> >> Thanks, >> James >> >> >> On Sat, Apr 5, 2014 at 10:42 AM, Vladimir Rodionov < >> [email protected]> wrote: >> >>> I am not sure it was implemented already, or it was not ... >>> It would be nice to have a way to tell Phoenix which column is naturally >>> mapped to HBase timestamp. This will greatly improve range queries on >>> this column by utilizing internal HBase optimizations (skipping store >>> files, which are completely out of the specified timestamp range). >>> >>> >>> -Vladimir Rodionov >>> >> >> >
