Yes it is possible. You need to use jsonfile method on SQL context and then create a dataframe from the rdd. Then register it as a table. Should be 3 lines of code, thanks to spark.
You may see few YouTube video esp for unifying pipelines. On 3 May 2015 19:02, "Jai" <jai4l...@gmail.com> wrote: > Hi, > > I am noob to spark and related technology. > > i have JSON stored at same location on all worker clients spark cluster). > I am looking to load JSON data set on these clients and do SQL query, like > distributed SQL. > > is it possible to achieve? > > right now, master submits task to one node only. > > Thanks and regards > Mrityunjay >