Re: [Thriftserver2] Controlling number of tasks

2016-08-03 Thread Chanh Le
I believe there is no way to reduce tasks by Hive using coalesce because when It come to Hive just read the files and depend on number of files you put into. So The way to did was coalesce at the ELT layer put a small number of files as possible reduce IO time for reading file. > On Aug 3,

[Thriftserver2] Controlling number of tasks

2016-08-03 Thread Yana Kadiyska
Hi folks, I have an ETL pipeline that drops a file every 1/2 hour. When spark reads these files, I end up with 315K tasks for a dataframe reading a few days worth of data. I now with a regular Spark job, I can use coalesce to come to a lower number of tasks. Is there a way to tell