Hi, Narayanan:
The current problem is that for a generic solution, there is no way that we 
know that element in the Json is an array. Keep in mind that in any element of 
Json, it could be any valid structure. So it could be array, another structure, 
or map etc. 
You know your data, so you can say in this level, it is array. But computer 
doesn't know, that is why you need to provide a schema.
Think about it, in programming, we can cast that to array, but normally that is 
NOT a good solution, so for a generic solution like any hadoop json UDF, it 
will and should ask for a schema.
For you case, if you know the data, it gets to be array, then write your own 
UDF to cast it to an array, without any schema. But I don't think any good, 
generic Json UDFs will support that for your case.
Yong

> Date: Mon, 7 Apr 2014 16:47:44 -0700
> Subject: Re: get_json_object for nested field returning a String instead of 
> an Array
> From: knarayana...@gmail.com
> To: user@hive.apache.org
> 
> Thanks Peyman.
> 
> Actually the problem with Hive-Json-Serde is that we need to provide
> the entire schema upfront while creating the table.
> 
> My requirement is that we just project/aggregate on the fields using
> get_json_object after creating the external table without schema. This
> way the external table is agnostic to any new schema changes.
> 
> Would love to get a solution for converting get_json_object to return
> an Array instead of a string.. Can we use any Hive UDFs to convert
> string into an explodable Array object ?
> 
> Thanks
> Narayanan
> 
> On Mon, Apr 7, 2014 at 4:14 PM, Peyman Mohajerian <mohaj...@gmail.com> wrote:
> > perhaps: https://github.com/rcongiu/Hive-JSON-Serde
> >
> >
> > On Mon, Apr 7, 2014 at 6:52 PM, Narayanan K <knarayana...@gmail.com> wrote:
> >>
> >> Hi all
> >>
> >> I am using get_json_object to read a json text file. I have created
> >> the external table as below :
> >>
> >> CREATE EXTERNAL TABLE EXT_TABLE ( json string)
> >> PARTITIONED BY (dt string)
> >> LOCATION '/users/abc/';
> >>
> >>
> >> The json data has some fields that are not simple fields but fields
> >> which are nested fields like -  "field" : [{"id":1},{"id":2}.. ].
> >>
> >> While using the get_json_object to retrieve that field, it is
> >> returning back a string instead of an Array. Hence I am not able to
> >> explode the array as it is a string.
> >>
> >> Is there some way we can get an array of get_json_object instead of a
> >> string so that we can perform explode on this nested field ? or Anyway
> >> we can convert the string into an array so that I can use explode ?
> >>
> >> Thanks in advance,
> >> Narayanan
> >
> >
                                          

Reply via email to