Re: Dealing with missing columns in SPARK SQL in JSON

Sam Elamin Tue, 14 Feb 2017 07:08:40 -0800

ah if thats the case then you might need to define the schema before hand.
Either that or if you want to infer it then ensure a jsonfile exists with
the right schema so spark infers the right columns


essentially making both files one dataframe if that makes sense

On Tue, Feb 14, 2017 at 3:04 PM, Aseem Bansal <asmbans...@gmail.com> wrote:

> Sorry if I trivialized the example. It is the same kind of file and
> sometimes it could have "a", sometimes "b", sometimes both. I just don't
> know. That is what I meant by missing columns.
>
> It would be good if I read any of the JSON and if I do spark sql and it
> gave me
>
> for json1.json
>
> a | b
> 1 | null
>
> for json2.json
>
> a     | b
> null | 2
>
>
> On Tue, Feb 14, 2017 at 8:13 PM, Sam Elamin <hussam.ela...@gmail.com>
> wrote:
>
>> I may be missing something super obvious here but can't you combine them
>> into a single dataframe. Left join perhaps?
>>
>> Try writing it in sql " select a from json1 and b from josn2"then run
>> explain to give you a hint to how to do it in code
>>
>> Regards
>> Sam
>> On Tue, 14 Feb 2017 at 14:30, Aseem Bansal <asmbans...@gmail.com> wrote:
>>
>>> Say I have two files containing single rows
>>>
>>> json1.json
>>>
>>> {"a": 1}
>>>
>>> json2.json
>>>
>>> {"b": 2}
>>>
>>> I read in this json file using spark's API into a dataframe one at a
>>> time. So I have
>>>
>>> Dataset json1DF
>>> and
>>> Dataset json2DF
>>>
>>> If I run "select a, b from __THIS__" in a SQLTransformer then I will get
>>> an exception as for json1DF does not have "b" and json2DF does not have "a"
>>>
>>> How could I handle this situation with missing columns in JSON?
>>>
>>
>

Re: Dealing with missing columns in SPARK SQL in JSON

Reply via email to