ah if thats the case then you might need to define the schema before hand. Either that or if you want to infer it then ensure a jsonfile exists with the right schema so spark infers the right columns
essentially making both files one dataframe if that makes sense On Tue, Feb 14, 2017 at 3:04 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > Sorry if I trivialized the example. It is the same kind of file and > sometimes it could have "a", sometimes "b", sometimes both. I just don't > know. That is what I meant by missing columns. > > It would be good if I read any of the JSON and if I do spark sql and it > gave me > > for json1.json > > a | b > 1 | null > > for json2.json > > a | b > null | 2 > > > On Tue, Feb 14, 2017 at 8:13 PM, Sam Elamin <hussam.ela...@gmail.com> > wrote: > >> I may be missing something super obvious here but can't you combine them >> into a single dataframe. Left join perhaps? >> >> Try writing it in sql " select a from json1 and b from josn2"then run >> explain to give you a hint to how to do it in code >> >> Regards >> Sam >> On Tue, 14 Feb 2017 at 14:30, Aseem Bansal <asmbans...@gmail.com> wrote: >> >>> Say I have two files containing single rows >>> >>> json1.json >>> >>> {"a": 1} >>> >>> json2.json >>> >>> {"b": 2} >>> >>> I read in this json file using spark's API into a dataframe one at a >>> time. So I have >>> >>> Dataset json1DF >>> and >>> Dataset json2DF >>> >>> If I run "select a, b from __THIS__" in a SQLTransformer then I will get >>> an exception as for json1DF does not have "b" and json2DF does not have "a" >>> >>> How could I handle this situation with missing columns in JSON? >>> >> >