Re:

Steve Terrell Fri, 03 Jun 2016 08:18:28 -0700

I have a similar situation.  I have records with varying fields that I
wanted to access individually and also as a group.

My actual records are JSON objects, so they look like like this:
{"field1": value1, "field2": value2, …}
To make matter harder, the fields are also varying types:  ints, strings,
boolean, floats.

What I ended up doing that has worked really well is make each field in the
JSON record a dynamic field in a Phoenix table.  And, to preserve the
integrity of the original record, I have one static string field in which
the entire original JSON goes into.

Because of the dynamic fields, I have to execute an UPSERT for each row
instead of some kind of batch loading.  But, I routinely load millions of
rows like this, sometimes with 20+ fields, and it has never failed on an
upsert and runs fast enough.

A few of the fields are actually always in each record, so I use those for
my primary keys,

The end result is great!  I can query by any field, quickly pull entire
records from the key fields, and any JSON object can be loaded.

In my code that translates the JSON into UPSERT statements I also keep a
running tab of field names, their type, and number of occurrences.  Then a
the end of my millions of UPSERTS I populate a simple table that just lists
all the fields/types/counts so that there is no mystery about what fields
are available when I want to query the big table.  I did this because I
don't always know what field names are in the JSON, and I don't know how to
get a list of dynamic field names from Phoenix SQL.

Hope this helps,
    Steve

On Thu, Jun 2, 2016 at 9:28 PM, Stephen Wilcoxon <[email protected]> wrote:

> The simple solution is to transform the multi-valued field into a separate
> linked table with multiple rows per "id".  If this is what you mean by
> "exploding the data", why do you not want to do so?
>
> On Thu, Jun 2, 2016 at 7:11 PM, Rahul Jain <[email protected]> wrote:
>
>>
>> Folks,
>>
>> I used this question earlier on the wrong list; posting on apache phoenix
>> user group on the advise of James Taylor:
>>
>> We have a use case of migration from legacy database to hbase/phoenix.
>> I'd want to hear your thoughts on how to manage the  given multi-valued
>>  field:
>>
>> Field name : namesList
>>
>> Contains: comma separated list of strings.
>>
>>  e.g. "Jack, Richard, Fred"
>>
>> The field does not have an inherent limit on the number of strings,
>> however for successful migration we need to support unto 100 strings in the
>> field
>>
>> This field is currently indexed in our legacy DB technology where the
>> user is able to query by any of the contained strings.
>>
>> e.g. query:
>>
>> select * from myTable where namesList LIKE '%Fred%';
>>
>> Is there any way Phoenix supports generating multiple index values from
>> a single column in a row ? We are looking for a method where we don't
>> explicitly specify individual offsets for indexing, (nameList[0],
>> nameList[1] etc) , rather say a custom functional index equivalent
>> generates the indexes as a list and each of the index in the list is
>> persisted individually.
>>
>> Effectively our goal is to avoid full table scans when doing queries that
>> match values from multi valued fields such as example above. We do want
>> to avoid exploding the data however possible.
>>
>> Any comments / suggestions are welcome.
>>
>> Thank you !
>>
>> Rahul
>>
>
>

Re:

Reply via email to