James, I don't understand how to use IF NOT EXISTS to track the dynamic columns. Could you elaborate? Thanks!
On Fri, Jun 3, 2016 at 10:36 AM, James Taylor <[email protected]> wrote: > That's pretty slick, Steve. Another variant along the same lines would be > to create/alter a view over a base table where you use the IF NOT EXISTS. > In this way, you let Phoenix track the dynamic columns for you. We have a > new feature in 4.8 for declaring a table as an APPEND_ONLY_SCHEMA and we'll > minimize the RPCs for ensuring the meta data is up to date. It works well > for dynamically tracking schema as data is processed, a pretty common > pattern. > > Thanks, > James > > > On Friday, June 3, 2016, Steve Terrell <[email protected]> wrote: > >> I have a similar situation. I have records with varying fields that I >> wanted to access individually and also as a group. >> >> My actual records are JSON objects, so they look like like this: >> {"field1": value1, "field2": value2, …} >> To make matter harder, the fields are also varying types: ints, strings, >> boolean, floats. >> >> What I ended up doing that has worked really well is make each field in >> the JSON record a dynamic field in a Phoenix table. And, to preserve the >> integrity of the original record, I have one static string field in which >> the entire original JSON goes into. >> >> Because of the dynamic fields, I have to execute an UPSERT for each row >> instead of some kind of batch loading. But, I routinely load millions of >> rows like this, sometimes with 20+ fields, and it has never failed on an >> upsert and runs fast enough. >> >> A few of the fields are actually always in each record, so I use those >> for my primary keys, >> >> The end result is great! I can query by any field, quickly pull entire >> records from the key fields, and any JSON object can be loaded. >> >> In my code that translates the JSON into UPSERT statements I also keep a >> running tab of field names, their type, and number of occurrences. Then a >> the end of my millions of UPSERTS I populate a simple table that just lists >> all the fields/types/counts so that there is no mystery about what fields >> are available when I want to query the big table. I did this because I >> don't always know what field names are in the JSON, and I don't know how to >> get a list of dynamic field names from Phoenix SQL. >> >> Hope this helps, >> Steve >> >> On Thu, Jun 2, 2016 at 9:28 PM, Stephen Wilcoxon <[email protected]> >> wrote: >> >>> The simple solution is to transform the multi-valued field into a >>> separate linked table with multiple rows per "id". If this is what you >>> mean by "exploding the data", why do you not want to do so? >>> >>> On Thu, Jun 2, 2016 at 7:11 PM, Rahul Jain <[email protected]> wrote: >>> >>>> >>>> Folks, >>>> >>>> I used this question earlier on the wrong list; posting on apache >>>> phoenix user group on the advise of James Taylor: >>>> >>>> We have a use case of migration from legacy database to hbase/phoenix. >>>> I'd want to hear your thoughts on how to manage the given multi-valued >>>> field: >>>> >>>> Field name : namesList >>>> >>>> Contains: comma separated list of strings. >>>> >>>> e.g. "Jack, Richard, Fred" >>>> >>>> The field does not have an inherent limit on the number of strings, >>>> however for successful migration we need to support unto 100 strings in the >>>> field >>>> >>>> This field is currently indexed in our legacy DB technology where the >>>> user is able to query by any of the contained strings. >>>> >>>> e.g. query: >>>> >>>> select * from myTable where namesList LIKE '%Fred%'; >>>> >>>> Is there any way Phoenix supports generating multiple index values from >>>> a single column in a row ? We are looking for a method where we don't >>>> explicitly specify individual offsets for indexing, (nameList[0], >>>> nameList[1] etc) , rather say a custom functional index equivalent >>>> generates the indexes as a list and each of the index in the list is >>>> persisted individually. >>>> >>>> Effectively our goal is to avoid full table scans when doing queries >>>> that match values from multi valued fields such as example above. We >>>> do want to avoid exploding the data however possible. >>>> >>>> Any comments / suggestions are welcome. >>>> >>>> Thank you ! >>>> >>>> Rahul >>>> >>> >>> >>
