If you are using EMR, please try their latest release, there will be very few reasons left for using SPARK ever at all (particularly given that hiveContext rides a lot on HIVE) if you are using SQL.
Just over regular csv data I have seen Hive on TEZ performance gains by 100x (query 64 million rows x 570 columns in 2.5 mins) , and when using ORC the performance gains are super fast (query 64 million rows x 570 columns in 54 seconds) and with proper partitioning and indexing in ORC its blazing fast (query 64 million rows x 570 columns in 19 seconds). There is perhaps a reason why SPARK makes things slow while using ORC :) Regards, Gourav On Thu, Jul 21, 2016 at 12:40 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com> wrote: > It works. Is it better to have hive in this case for better performance ? > > On Thu, Jul 21, 2016 at 12:30 PM, Simone <simone.mirag...@gmail.com> > wrote: > >> If you have a folder, and a bunch of json inside that folder- yes it >> should work. Just set as path something like "path/to/your/folder/*.json" >> All files will be loaded into a dataframe and schema will be the union of >> all the different schemas of your json files (only if you have different >> schemas) >> It should work - let me know >> >> Simone Miraglia >> ------------------------------ >> Da: Ashutosh Kumar <kmr.ashutos...@gmail.com> >> Inviato: 21/07/2016 08:55 >> A: Simone <simone.mirag...@gmail.com>; user @spark >> <user@spark.apache.org> >> Oggetto: Re: Reading multiple json files form nested folders for data >> frame >> >> That example points to a particular json file. Will it work same way if I >> point to top level folder containing all json files ? >> >> On Thu, Jul 21, 2016 at 12:04 PM, Simone <simone.mirag...@gmail.com> >> wrote: >> >>> Yes you can - have a look here >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets >>> >>> Hope it helps >>> >>> Simone Miraglia >>> ------------------------------ >>> Da: Ashutosh Kumar <kmr.ashutos...@gmail.com> >>> Inviato: 21/07/2016 08:19 >>> A: user @spark <user@spark.apache.org> >>> Oggetto: Reading multiple json files form nested folders for data frame >>> >>> I need to read bunch of json files kept in date wise folders and perform >>> sql queries on them using data frame. Is it possible to do so? Please >>> provide some pointers . >>> >>> Thanks >>> Ashutosh >>> >> >> >