Hi Guodong, Does the RAW type meet your requirements? For example, you can specify map<varchar, raw> type, and the value for the map is the raw JsonNode parsed from Jackson. This is not supported yet, however IMO this could be supported.
Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午9:43写道: > Benchao, > > Thank you for your quick reply. > > As you mentioned, for current scenario, approach 2 should work for me. But > it is a little bit annoying that I have to modify schema to add new field > types when upstream app changes the json format or adds new fields. > Otherwise, my user can not refer the field in their SQL. > > Per description in the jira, I think after implementing this, all the json > values will be converted as strings. > I am wondering if Flink SQL can/will support the flexible schema in the > future, for example, register the table without defining specific schema > for each field, to let user define a generic map or array for one field. > but the value of map/array can be any object. Then, the type conversion > cost might be saved. > > Guodong > > > On Thu, May 28, 2020 at 7:43 PM Benchao Li <libenc...@gmail.com> wrote: > >> Hi Guodong, >> >> I think you almost get the answer, >> 1. map type, it's not working for current implementation. For example, >> use map<varchar, varchar>, if the value if non-string json object, then >> `JsonNode.asText()` may not work as you wish. >> 2. list all fields you cares. IMO, this can fit your scenario. And you >> can set format.fail-on-missing-field = true, to allow setting non-existed >> fields to be null. >> >> For 1, I think maybe we can support it in the future, and I've created >> jira[1] to track this. >> >> [1] https://issues.apache.org/jira/browse/FLINK-18002 >> >> Guodong Wang <wangg...@gmail.com> 于2020年5月28日周四 下午6:32写道: >> >>> Hi ! >>> >>> I want to use Flink SQL to process some json events. It is quite >>> challenging to define a schema for the Flink SQL table. >>> >>> My data source's format is some json like this >>> { >>> "top_level_key1": "some value", >>> "nested_object": { >>> "nested_key1": "abc", >>> "nested_key2": 123, >>> "nested_key3": ["element1", "element2", "element3"] >>> } >>> } >>> >>> The big challenges for me to define a schema for the data source are >>> 1. the keys in nested_object are flexible, there might be 3 unique keys >>> or more unique keys. If I enumerate all the keys in the schema, I think my >>> code is fragile, how to handle event which contains more nested_keys in >>> nested_object ? >>> 2. I know table api support Map type, but I am not sure if I can put >>> generic object as the value of the map. Because the values in nested_object >>> are of different types, some of them are int, some of them are string or >>> array. >>> >>> So. how to expose this kind of json data as table in Flink SQL without >>> enumerating all the nested_keys? >>> >>> Thanks. >>> >>> Guodong >>> >> >> >> -- >> >> Best, >> Benchao Li >> > -- Best, Benchao Li