Hi, We are newbies learning spark. We are running Scala query against our Parquet table. Whenever we fire query in Jupyter, results are shown in page, Only part of results are shown in UI. So we are trying to store the results into table which is Parquet format. By default, In Spark all the tables are stored in parquet format. So the results of Query are saved into parquet table, but we see performance impact because of loading data into table/file. We were comparing time it takes to execute same query in Spark v/s Hive. We see Spark is performing faster and quicker whenever we don't store the results into File/Table. Whenever we run the query and store results into parquet table, it takes more time than Hive total execution. So to improve the export of data timing, we would like to save the results in Plain text tab delimited or CSV format. Is it possible today in spark. we are using 1.6 version of spark.
In hive, we have amabari to configure the hive server 2 settings. Is there any UI for SPARK configuration. ? One more difference we have identified is whenever Hive TEZ query is executed, it is taking whole cluster available RAM memory for execution . whereas in Spark it takes only 30% of available memory say 30 GB out of 100 GB. Is it possible to increase memory usage, so that SPARK query runs faster than Hive. /Mahender