Benchao,

Thank you for your quick reply.

As you mentioned, for current scenario, approach 2 should work for me. But
it is a little bit annoying that I have to modify schema to add new field
types when upstream app changes the json format or adds new fields.
Otherwise, my user can not refer the field in their SQL.

Per description in the jira, I think after implementing this, all the json
values will be converted as strings.
I am wondering if Flink SQL can/will support the flexible schema in the
future, for example, register the table without defining specific schema
for each field, to let user define a generic map or array for one field.
but the value of map/array can be any object. Then, the type conversion
cost might be saved.

Guodong


On Thu, May 28, 2020 at 7:43 PM Benchao Li <libenc...@gmail.com> wrote:

> Hi Guodong,
>
> I think you almost get the answer,
> 1. map type, it's not working for current implementation. For example, use
> map<varchar, varchar>, if the value if non-string json object, then
> `JsonNode.asText()` may not work as you wish.
> 2. list all fields you cares. IMO, this can fit your scenario. And you can
> set format.fail-on-missing-field = true, to allow setting non-existed
> fields to be null.
>
> For 1, I think maybe we can support it in the future, and I've created
> jira[1] to track this.
>
> [1] https://issues.apache.org/jira/browse/FLINK-18002
>
> Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道:
>
>> Hi !
>>
>> I want to use Flink SQL to process some json events. It is quite
>> challenging to define a schema for the Flink SQL table.
>>
>> My data source's format is some json like this
>> {
>> "top_level_key1": "some value",
>> "nested_object": {
>> "nested_key1": "abc",
>> "nested_key2": 123,
>> "nested_key3": ["element1", "element2", "element3"]
>> }
>> }
>>
>> The big challenges for me to define a schema for the data source are
>> 1. the keys in nested_object are flexible, there might be 3 unique keys
>> or more unique keys. If I enumerate all the keys in the schema, I think my
>> code is fragile, how to handle event which contains more  nested_keys in
>> nested_object ?
>> 2. I know table api support Map type, but I am not sure if I can put
>> generic object as the value of the map. Because the values in nested_object
>> are of different types, some of them are int, some of them are string or
>> array.
>>
>> So. how to expose this kind of json data as table in Flink SQL without
>> enumerating all the nested_keys?
>>
>> Thanks.
>>
>> Guodong
>>
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to