I need to filter out outliers from a dataframe by all columns. I can
manually list all columns like:

df.filter(x=>math.abs(x.get(0).toString().toDouble-means(0))<=3*stddevs(0))

    .filter(x=>math.abs(x.get(1).toString().toDouble-means(1))<=3*stddevs(1
))

    ...

But I want to turn it into a general function which can handle variable
number of columns. How could I do that? Thanks in advance!


Regards,

Shawn




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/filter-rows-by-all-columns-tp28309.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to