Re:

Steve Terrell Fri, 03 Jun 2016 08:42:12 -0700

James, I don't understand how to use IF NOT EXISTS to track the dynamic
columns.  Could you elaborate?  Thanks!


On Fri, Jun 3, 2016 at 10:36 AM, James Taylor <[email protected]>
wrote:

> That's pretty slick, Steve. Another variant along the same lines would be
> to create/alter a view over a base table where you use the IF NOT EXISTS.
> In this way, you let Phoenix track the dynamic columns for you. We have a
> new feature in 4.8 for declaring a table as an APPEND_ONLY_SCHEMA and we'll
> minimize the RPCs for ensuring the meta data is up to date. It works well
> for dynamically tracking schema as data is processed, a pretty common
> pattern.
>
> Thanks,
> James
>
>
> On Friday, June 3, 2016, Steve Terrell <[email protected]> wrote:
>
>> I have a similar situation.  I have records with varying fields that I
>> wanted to access individually and also as a group.
>>
>> My actual records are JSON objects, so they look like like this:
>> {"field1": value1, "field2": value2, …}
>> To make matter harder, the fields are also varying types:  ints, strings,
>> boolean, floats.
>>
>> What I ended up doing that has worked really well is make each field in
>> the JSON record a dynamic field in a Phoenix table.  And, to preserve the
>> integrity of the original record, I have one static string field in which
>> the entire original JSON goes into.
>>
>> Because of the dynamic fields, I have to execute an UPSERT for each row
>> instead of some kind of batch loading.  But, I routinely load millions of
>> rows like this, sometimes with 20+ fields, and it has never failed on an
>> upsert and runs fast enough.
>>
>> A few of the fields are actually always in each record, so I use those
>> for my primary keys,
>>
>> The end result is great!  I can query by any field, quickly pull entire
>> records from the key fields, and any JSON object can be loaded.
>>
>> In my code that translates the JSON into UPSERT statements I also keep a
>> running tab of field names, their type, and number of occurrences.  Then a
>> the end of my millions of UPSERTS I populate a simple table that just lists
>> all the fields/types/counts so that there is no mystery about what fields
>> are available when I want to query the big table.  I did this because I
>> don't always know what field names are in the JSON, and I don't know how to
>> get a list of dynamic field names from Phoenix SQL.
>>
>> Hope this helps,
>>     Steve
>>
>> On Thu, Jun 2, 2016 at 9:28 PM, Stephen Wilcoxon <[email protected]>
>> wrote:
>>
>>> The simple solution is to transform the multi-valued field into a
>>> separate linked table with multiple rows per "id".  If this is what you
>>> mean by "exploding the data", why do you not want to do so?
>>>
>>> On Thu, Jun 2, 2016 at 7:11 PM, Rahul Jain <[email protected]> wrote:
>>>
>>>>
>>>> Folks,
>>>>
>>>> I used this question earlier on the wrong list; posting on apache
>>>> phoenix user group on the advise of James Taylor:
>>>>
>>>> We have a use case of migration from legacy database to hbase/phoenix.
>>>> I'd want to hear your thoughts on how to manage the  given multi-valued
>>>>  field:
>>>>
>>>> Field name : namesList
>>>>
>>>> Contains: comma separated list of strings.
>>>>
>>>>  e.g. "Jack, Richard, Fred"
>>>>
>>>> The field does not have an inherent limit on the number of strings,
>>>> however for successful migration we need to support unto 100 strings in the
>>>> field
>>>>
>>>> This field is currently indexed in our legacy DB technology where the
>>>> user is able to query by any of the contained strings.
>>>>
>>>> e.g. query:
>>>>
>>>> select * from myTable where namesList LIKE '%Fred%';
>>>>
>>>> Is there any way Phoenix supports generating multiple index values from
>>>> a single column in a row ? We are looking for a method where we don't
>>>> explicitly specify individual offsets for indexing, (nameList[0],
>>>> nameList[1] etc) , rather say a custom functional index equivalent
>>>> generates the indexes as a list and each of the index in the list is
>>>> persisted individually.
>>>>
>>>> Effectively our goal is to avoid full table scans when doing queries
>>>> that match values from multi valued fields such as example above. We
>>>> do want to avoid exploding the data however possible.
>>>>
>>>> Any comments / suggestions are welcome.
>>>>
>>>> Thank you !
>>>>
>>>> Rahul
>>>>
>>>
>>>
>>

Re:

Reply via email to