If the dynamic column approach with the cf.* feature (PHOENIX-374) meets your needs, that's good feedback. FWIW, you would not need to create all the views up front at schema creation time. You can create them on-the-fly. All views share the same, single underlying HBase table, so no HBase metadata operations would be performed when you create a new view. Creating a new view more or less correspond to doing a Put of a few rows in the SYSTEM.CATALOG table (the HBase table that holds your metadata), so it's not an expensive/heavy-weight operation to create and alter them (as it is with a typical RDBMS). Thanks to HBase, you can create an "infinite" set of them.
Thanks, James On Wed, Jan 7, 2015 at 12:58 AM, Sumanta Gh <[email protected]> wrote: > Thanks James for replying. > The below example is really a smart way to map dynamic columns into static > ones. I will implement the idea in another case. > But I can not create these views at schema creation time and I really have > an infinite set of values for event_type. > Keeping a column for all dynamic column names in a cell is perhaps better > solution for my case. > Finally, I insist the community to provide an API similar to cf.* to solve > this kind of issue with simplicity. > > Regards > Sumanta > > > -----James Taylor <[email protected]> wrote: ----- > To: user <[email protected]> > From: James Taylor <[email protected]> > Date: 01/07/2015 01:35PM > > Subject: Re: Select dynamic column content > > Hi Sumanta, > Another alternative option is to leverage support for VIEWs in Phoenix ( > http://phoenix.apache.org/views.html). In many use cases I've seen where > there are hundreds of sparse columns defined for a schema, there's a column > that determines *which* sparse columns are applicable for a given row. An > example would be a metric_event table in which an event_type column value > defines which columns are associated with a given event_type. This might be > modelled with a base table and a series of views like this: > > -- base table > CREATE TABLE metric_event ( > event_type CHAR(2), > event_id VARCHAR, > created_date DATE, > CONSTRAINT pk PRIMARY KEY (event_type, event_id)); > > -- updatable view for garbage collection event > CREATE VIEW gc_event( -- define columns specific to a GC event > duration BIGINT, > eden_space BIGINT, > survivor_space BIGINT, > ...) > AS SELECT * FROM event WHERE event_type = 'GC' > > -- updatable view for IO event > CREATE VIEW io_event( -- define columns specific to an IO event > io_wait_time BIGINT, > iops BIGINT, > read_workload INTEGER, > write_workload INTEGER, > ...) > AS SELECT * FROM event WHERE event_type = 'IO' > > Using this approach, Phoenix will keep track of the columns each view for > you so you don't have to. All views reside in the same HBase table (the one > corresponding to your base table). When you do a SELECT * FROM io_event, > you'll get back only those columns defined for that view plus the columns > from the base table. > > A further orthogonal level of granularity allows you to define your base > table as a MULTI_TENANT table and have each VIEW vary depending on the > particular user (identified at connection time through a property). Just as > with the scenario described above, each user may define different columns > that only exist for that particular user. See > http://phoenix.apache.org/multi-tenancy.html for more info. > > HTH, > > James > > On Tue, Jan 6, 2015 at 9:43 AM, Jesse Yates <[email protected]> > wrote: > >> And it looks like you already figured that out :) >> >> >> On Tue, Jan 6, 2015, 9:43 AM Jesse Yates <[email protected]> wrote: >> >>> You wouldn't even need another table, just a single VARCHAR[] column to >>> keep the column names. Its ideal to keep it in the same row (possibly in >>> another cf if you expect it to be large) since you get ACID compliance on >>> that row, which you wouldn't get from using another table. You then just >>> upsert the names column at the same time you upsert the dynamic columns. >>> >>> Phoenix does something similar for tracing where there is an unknown >>> number of annotations - we keep a trace annotation count column which then >>> can be used to figure out the dynamic annotation column names (which are >>> things like annotations.a0, .a1, .a2, etc) >>> >>> The downside is that you then need to do two queries to get all the >>> columns, but until we implement the cf.* logic for dynamic columns, that's >>> the best you can do. >>> >>> - jesse >>> >>> On Tue, Jan 6, 2015, 9:23 AM Sumanta Gh <[email protected]> wrote: >>> >>>> Thanks Nicolas for replying. >>>> I am already managing dynamic column names either putting them in a >>>> separate column or keeping the names in cache. >>>> But looking at the pace Phoenix is evolving, IMHO this cf.* query >>>> pattern would be very much helpful for users. >>>> The stock HBase client is capable of doing that. >>>> Because my table is extremely sparsed and I allow quite a random schema >>>> in every row of the table, getting the content of the dynamic column in a >>>> single query will save lot of time. >>>> Looking forward to your completion of the work... >>>> >>>> Regards >>>> Sumanta >>>> >>>> >>>> -----Nicolas Maillard <[email protected]> wrote: ----- >>>> To: [email protected] >>>> From: Nicolas Maillard <[email protected]> >>>> Date: 01/06/2015 03:08PM >>>> Subject: Re: Select dynamic column content >>>> >>>> >>>> Hello Sumanta >>>> >>>> This is a last bit missing in the select cf.* query pattern that would >>>> bring back not only known columns but all dynamic ones also. I never got >>>> around to finishing that bit for different reasons. The best way would be >>>> to finish that, other than that I do not see an easy way to retrieve >>>> dynamic columns of which you have lost the column name. I guess if there >>>> is a logic ti these column names you could try to reconstruct or keep a >>>> second table of keys to dynamic column names to find them in the after >>>> math. >>>> >>>> regards >>>> >>>> On Tue, Jan 6, 2015 at 8:01 AM, Sumanta Gh <[email protected]> wrote: >>>> >>>>> Hi, >>>>> It has been a puzzle for me to get the content of dynamic columns in a >>>>> single SELECT * FROM query. >>>>> Challenge is that I do not know the dynamic column names in advance so >>>>> I can not mention a dynamic column in the SELECT query. >>>>> Is there any way out? Please suggest. >>>>> >>>>> Regards >>>>> Sumanta >>>>> >>>>> =====-----=====-----===== >>>>> Notice: The information contained in this e-mail >>>>> message and/or attachments to it may contain >>>>> confidential or privileged information. If you are >>>>> not the intended recipient, any dissemination, use, >>>>> review, distribution, printing or copying of the >>>>> information contained in this e-mail message >>>>> and/or attachments to it are strictly prohibited. If >>>>> you have received this communication in error, >>>>> please notify us by reply e-mail or telephone and >>>>> immediately and permanently delete the message >>>>> and any attachments. Thank you >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nicolas Maillard Solution Engineer >>>> Phone: +33 (0)6 68 17 66 05 >>>> Email: nmaillard*@hortonworks.com* <[email protected]> >>>> Website: http://www.hortonworks.com >>>> >>>> >>>> >>>> CONFIDENTIALITY NOTICE >>>> NOTICE: This message is intended for the use of the individual or >>>> entity to which it is addressed and may contain information that is >>>> confidential, privileged and exempt from disclosure under applicable law. >>>> If the reader of this message is not the intended recipient, you are hereby >>>> notified that any printing, copying, dissemination, distribution, >>>> disclosure or forwarding of this communication is strictly prohibited. If >>>> you have received this communication in error, please contact the sender >>>> immediately and delete it from your system. Thank You. >>>> >>> >
