I am trying to normalize a dataset (convert values for all attributes in the
vector to "0-1" range). I created an RDD of tuple (attrib-name,
attrib-value) for all the records in the dataset as follows:

val attribMap : RDD[(String,DoubleDimension)] = contactDataset.flatMap( 
                          contact => { 
                            List(
                              ("dage",contact.dage match { case Some(value)
=> DoubleDimension(value) ; case None => null }),
                              ("dancstry1",contact.dancstry1 match { case
Some(value) => DoubleDimension(value) ; case None => null }),
                              ("dancstry2",contact.dancstry2 match { case
Some(value) => DoubleDimension(value) ; case None => null }),
                              ("ddepart",contact.ddepart match { case
Some(value) => DoubleDimension(value) ; case None => null }),
                              ("dhispanic",contact.dhispanic match { case
Some(value) => DoubleDimension(value) ; case None => null }),
                              ("dhour89",contact.dhour89 match { case
Some(value) => DoubleDimension(value) ; case None => null })
                            )
                          }
                        )

Here, contactDataset is of the type RDD[Contact]. The fields of Contact
class are of type Option[Long].

DoubleDimension is a simple wrapper over Double datatype. It extends the
Ordered trait and implements corresponding compare method and equals method.

To obtain the max and min attribute vector for computing the normalized
values,

maxVector = attribMap.reduceByKey( getMax )
minVector = attribMap.reduceByKey( getMin )

Implementation of getMax and getMin is as follows:

def getMax( a : DoubleDimension, b : DoubleDimension ) : DoubleDimension = {
if (a > b) a 
else b 
}

def getMin( a : DoubleDimension, b : DoubleDimension) : DoubleDimension = {
if (a < b) a 
else b 
}

I get a compile error at calls to the methods getMax and getMin stating:

[ERROR] .../com/ameyamm/input_generator/DatasetReader.scala:117: error:
missing arguments for method getMax in class DatasetReader;

[ERROR] follow this method with '_' if you want to treat it as a partially
applied function

[ERROR] maxVector = attribMap.reduceByKey( getMax )

[ERROR] .../com/ameyamm/input_generator/DatasetReader.scala:118: error:
missing arguments for method getMin in class DatasetReader;

[ERROR] follow this method with '_' if you want to treat it as a partially
applied function

[ERROR] minVector = attribMap.reduceByKey( getMin )

I am not sure what I am doing wrong here. My RDD is an RDD of Pairs and as
per my knowledge, I can pass any method to it as long as the functions is of
the type f : (V, V) => V.

I am really stuck here. Please help.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Custom-function-for-reduceByKey-missing-arguments-for-method-tp23756.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to