Hello, I'm building an ML Pipeline which extract features from a DataFrame and I'd like it to behave like the following :
Log "Extracting feature 1" Extract feature 1 Log "Extracting feature 2" Extract feature 2 ... Log "Extracting feature n" Extract feature n The things is, transformations being lazy, I end up with the following : Log "Extracting feature 1" Log "Extracting feature 2" Log "Extracting feature n" Extract feature 1 Extract feature 2 ... Extract feature n My transform method looks a bit like that : override def transform(dataset: DataFrame): DataFrame = { var joinedDataFrame = extract(dataset, featuresToExtract head) for (featureToExtract <- featuresToExtract.tail) { // LOGGING HERE THAT I WANT CALLED JUST BEFORE THE CORRESPONDING TRANSFORMATION joinedDataFrame = joinedDataFrame.join(extract(dataset, featureToExtract), joinOn, "outer") } joinedDataFrame } Any idea on how to proceed ? Thanks [original question : http://stackoverflow.com/questions/40157032/how-can-i-log-the-moment-an-action-is-called-on-a-dataframe] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-log-the-moment-an-action-is-called-on-a-DataFrame-tp27962.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org