I am trying to normalize a dataset (convert values for all attributes in the vector to "0-1" range). I created an RDD of tuple (attrib-name, attrib-value) for all the records in the dataset as follows:
val attribMap : RDD[(String,DoubleDimension)] = contactDataset.flatMap( contact => { List( ("dage",contact.dage match { case Some(value) => DoubleDimension(value) ; case None => null }), ("dancstry1",contact.dancstry1 match { case Some(value) => DoubleDimension(value) ; case None => null }), ("dancstry2",contact.dancstry2 match { case Some(value) => DoubleDimension(value) ; case None => null }), ("ddepart",contact.ddepart match { case Some(value) => DoubleDimension(value) ; case None => null }), ("dhispanic",contact.dhispanic match { case Some(value) => DoubleDimension(value) ; case None => null }), ("dhour89",contact.dhour89 match { case Some(value) => DoubleDimension(value) ; case None => null }) ) } ) Here, contactDataset is of the type RDD[Contact]. The fields of Contact class are of type Option[Long]. DoubleDimension is a simple wrapper over Double datatype. It extends the Ordered trait and implements corresponding compare method and equals method. To obtain the max and min attribute vector for computing the normalized values, maxVector = attribMap.reduceByKey( getMax ) minVector = attribMap.reduceByKey( getMin ) Implementation of getMax and getMin is as follows: def getMax( a : DoubleDimension, b : DoubleDimension ) : DoubleDimension = { if (a > b) a else b } def getMin( a : DoubleDimension, b : DoubleDimension) : DoubleDimension = { if (a < b) a else b } I get a compile error at calls to the methods getMax and getMin stating: [ERROR] .../com/ameyamm/input_generator/DatasetReader.scala:117: error: missing arguments for method getMax in class DatasetReader; [ERROR] follow this method with '_' if you want to treat it as a partially applied function [ERROR] maxVector = attribMap.reduceByKey( getMax ) [ERROR] .../com/ameyamm/input_generator/DatasetReader.scala:118: error: missing arguments for method getMin in class DatasetReader; [ERROR] follow this method with '_' if you want to treat it as a partially applied function [ERROR] minVector = attribMap.reduceByKey( getMin ) I am not sure what I am doing wrong here. My RDD is an RDD of Pairs and as per my knowledge, I can pass any method to it as long as the functions is of the type f : (V, V) => V. I am really stuck here. Please help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Custom-function-for-reduceByKey-missing-arguments-for-method-tp23756.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org