Re: Optimize filter operations with sorted data

2016-07-21 Thread Chanh Le
You can check in spark UI or in output of spark application. How many stages and tasks before you partition and after. Also compare the run time. Regards, Chanh On Thu, Jul 7, 2016 at 6:40 PM, tan shai wrote: > How can you verify that it is loading only the part of time

Re: Optimize filter operations with sorted data

2016-07-07 Thread tan shai
How can you verify that it is loading only the part of time and network in filter ? 2016-07-07 11:58 GMT+02:00 Chanh Le : > Hi Tan, > It depends on how data organise and what your filter is. > For example in my case: I store data by partition by field time and > network_id.

Re: Optimize filter operations with sorted data

2016-07-07 Thread tan shai
Yes it is operating on the sorted column 2016-07-07 11:43 GMT+02:00 Ted Yu : > Does the filter under consideration operate on sorted column(s) ? > > Cheers > > > On Jul 7, 2016, at 2:25 AM, tan shai wrote: > > > > Hi, > > > > I have a sorted

Re: Optimize filter operations with sorted data

2016-07-07 Thread Chanh Le
Hi Tan, It depends on how data organise and what your filter is. For example in my case: I store data by partition by field time and network_id. If I filter by time or network_id or both and with other field Spark only load part of time and network in filter then filter the rest. > On Jul 7,

Re: Optimize filter operations with sorted data

2016-07-07 Thread Ted Yu
Does the filter under consideration operate on sorted column(s) ? Cheers > On Jul 7, 2016, at 2:25 AM, tan shai wrote: > > Hi, > > I have a sorted dataframe, I need to optimize the filter operations. > How does Spark performs filter operations on sorted dataframe? >

Optimize filter operations with sorted data

2016-07-07 Thread tan shai
Hi, I have a sorted dataframe, I need to optimize the filter operations. How does Spark performs filter operations on sorted dataframe? It is scanning all the data? Many thanks.