I have a similar situation. I have records with varying fields that I
wanted to access individually and also as a group.
My actual records are JSON objects, so they look like like this:
{"field1": value1, "field2": value2, …}
To make matter harder, the fields are also varying types: ints, strings,
boolean, floats.
What I ended up doing that has worked really well is make each field in the
JSON record a dynamic field in a Phoenix table. And, to preserve the
integrity of the original record, I have one static string field in which
the entire original JSON goes into.
Because of the dynamic fields, I have to execute an UPSERT for each row
instead of some kind of batch loading. But, I routinely load millions of
rows like this, sometimes with 20+ fields, and it has never failed on an
upsert and runs fast enough.
A few of the fields are actually always in each record, so I use those for
my primary keys,
The end result is great! I can query by any field, quickly pull entire
records from the key fields, and any JSON object can be loaded.
In my code that translates the JSON into UPSERT statements I also keep a
running tab of field names, their type, and number of occurrences. Then a
the end of my millions of UPSERTS I populate a simple table that just lists
all the fields/types/counts so that there is no mystery about what fields
are available when I want to query the big table. I did this because I
don't always know what field names are in the JSON, and I don't know how to
get a list of dynamic field names from Phoenix SQL.
Hope this helps,
Steve
On Thu, Jun 2, 2016 at 9:28 PM, Stephen Wilcoxon <[email protected]> wrote:
> The simple solution is to transform the multi-valued field into a separate
> linked table with multiple rows per "id". If this is what you mean by
> "exploding the data", why do you not want to do so?
>
> On Thu, Jun 2, 2016 at 7:11 PM, Rahul Jain <[email protected]> wrote:
>
>>
>> Folks,
>>
>> I used this question earlier on the wrong list; posting on apache phoenix
>> user group on the advise of James Taylor:
>>
>> We have a use case of migration from legacy database to hbase/phoenix.
>> I'd want to hear your thoughts on how to manage the given multi-valued
>> field:
>>
>> Field name : namesList
>>
>> Contains: comma separated list of strings.
>>
>> e.g. "Jack, Richard, Fred"
>>
>> The field does not have an inherent limit on the number of strings,
>> however for successful migration we need to support unto 100 strings in the
>> field
>>
>> This field is currently indexed in our legacy DB technology where the
>> user is able to query by any of the contained strings.
>>
>> e.g. query:
>>
>> select * from myTable where namesList LIKE '%Fred%';
>>
>> Is there any way Phoenix supports generating multiple index values from
>> a single column in a row ? We are looking for a method where we don't
>> explicitly specify individual offsets for indexing, (nameList[0],
>> nameList[1] etc) , rather say a custom functional index equivalent
>> generates the indexes as a list and each of the index in the list is
>> persisted individually.
>>
>> Effectively our goal is to avoid full table scans when doing queries that
>> match values from multi valued fields such as example above. We do want
>> to avoid exploding the data however possible.
>>
>> Any comments / suggestions are welcome.
>>
>> Thank you !
>>
>> Rahul
>>
>
>