You can check in spark UI or in output of spark application.
How many stages and tasks before you partition and after.
Also compare the run time.
Regards,
Chanh
On Thu, Jul 7, 2016 at 6:40 PM, tan shai wrote:
> How can you verify that it is loading only the part of time
How can you verify that it is loading only the part of time and network in
filter ?
2016-07-07 11:58 GMT+02:00 Chanh Le :
> Hi Tan,
> It depends on how data organise and what your filter is.
> For example in my case: I store data by partition by field time and
> network_id.
Yes it is operating on the sorted column
2016-07-07 11:43 GMT+02:00 Ted Yu :
> Does the filter under consideration operate on sorted column(s) ?
>
> Cheers
>
> > On Jul 7, 2016, at 2:25 AM, tan shai wrote:
> >
> > Hi,
> >
> > I have a sorted
Hi Tan,
It depends on how data organise and what your filter is.
For example in my case: I store data by partition by field time and network_id.
If I filter by time or network_id or both and with other field Spark only load
part of time and network in filter then filter the rest.
> On Jul 7,
Does the filter under consideration operate on sorted column(s) ?
Cheers
> On Jul 7, 2016, at 2:25 AM, tan shai wrote:
>
> Hi,
>
> I have a sorted dataframe, I need to optimize the filter operations.
> How does Spark performs filter operations on sorted dataframe?
>
Hi,
I have a sorted dataframe, I need to optimize the filter operations.
How does Spark performs filter operations on sorted dataframe?
It is scanning all the data?
Many thanks.