Hello,

I'm building an ML Pipeline which extract features from a DataFrame and I'd
like it to behave like the following :

Log "Extracting feature 1"
Extract feature 1
Log "Extracting feature 2"
Extract feature 2
...
Log "Extracting feature n"
Extract feature n

The things is, transformations being lazy, I end up with the following :

Log "Extracting feature 1"
Log "Extracting feature 2"
Log "Extracting feature n"
Extract feature 1
Extract feature 2
...
Extract feature n

My transform method looks a bit like that :

override def transform(dataset: DataFrame): DataFrame = {
   var joinedDataFrame = extract(dataset, featuresToExtract head)

   for (featureToExtract <- featuresToExtract.tail) {
     // LOGGING HERE THAT I WANT CALLED JUST BEFORE THE CORRESPONDING
TRANSFORMATION
     joinedDataFrame = joinedDataFrame.join(extract(dataset,
featureToExtract), joinOn, "outer")
   }
   joinedDataFrame
}

Any idea on how to proceed ?

Thanks

[original question :
http://stackoverflow.com/questions/40157032/how-can-i-log-the-moment-an-action-is-called-on-a-dataframe]



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-log-the-moment-an-action-is-called-on-a-DataFrame-tp27962.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to