Re: How to create schema for flexible json data in Flink SQL

Benchao Li Thu, 28 May 2020 08:12:19 -0700

Hi Guodong,

Does the RAW type meet your requirements? For example, you can specify
map<varchar, raw> type, and the value for the map is the raw JsonNode
parsed from Jackson.
This is not supported yet, however IMO this could be supported.


Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午9:43写道：

> Benchao,
>
> Thank you for your quick reply.
>
> As you mentioned, for current scenario, approach 2 should work for me. But
> it is a little bit annoying that I have to modify schema to add new field
> types when upstream app changes the json format or adds new fields.
> Otherwise, my user can not refer the field in their SQL.
>
> Per description in the jira, I think after implementing this, all the json
> values will be converted as strings.
> I am wondering if Flink SQL can/will support the flexible schema in the
> future, for example, register the table without defining specific schema
> for each field, to let user define a generic map or array for one field.
> but the value of map/array can be any object. Then, the type conversion
> cost might be saved.
>
> Guodong
>
>
> On Thu, May 28, 2020 at 7:43 PM Benchao Li <libenc...@gmail.com> wrote:
>
>> Hi Guodong,
>>
>> I think you almost get the answer,
>> 1. map type, it's not working for current implementation. For example,
>> use map<varchar, varchar>, if the value if non-string json object, then
>> `JsonNode.asText()` may not work as you wish.
>> 2. list all fields you cares. IMO, this can fit your scenario. And you
>> can set format.fail-on-missing-field = true, to allow setting non-existed
>> fields to be null.
>>
>> For 1, I think maybe we can support it in the future, and I've created
>> jira[1] to track this.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-18002
>>
>> Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道：
>>
>>> Hi !
>>>
>>> I want to use Flink SQL to process some json events. It is quite
>>> challenging to define a schema for the Flink SQL table.
>>>
>>> My data source's format is some json like this
>>> {
>>> "top_level_key1": "some value",
>>> "nested_object": {
>>> "nested_key1": "abc",
>>> "nested_key2": 123,
>>> "nested_key3": ["element1", "element2", "element3"]
>>> }
>>> }
>>>
>>> The big challenges for me to define a schema for the data source are
>>> 1. the keys in nested_object are flexible, there might be 3 unique keys
>>> or more unique keys. If I enumerate all the keys in the schema, I think my
>>> code is fragile, how to handle event which contains more  nested_keys in
>>> nested_object ?
>>> 2. I know table api support Map type, but I am not sure if I can put
>>> generic object as the value of the map. Because the values in nested_object
>>> are of different types, some of them are int, some of them are string or
>>> array.
>>>
>>> So. how to expose this kind of json data as table in Flink SQL without
>>> enumerating all the nested_keys?
>>>
>>> Thanks.
>>>
>>> Guodong
>>>
>>
>>
>> --
>>
>> Best,
>> Benchao Li
>>
>

-- 

Best,
Benchao Li

Re: How to create schema for flexible json data in Flink SQL

Reply via email to