Hi Stuti, This is a bug of AFTSurvivalRegression, we did not handle "lossSum == infinity" properly. I have open https://issues.apache.org/jira/browse/SPARK-13322 to track this issue and will send a PR. Thanks for reporting this issue.
Yanbo 2016-02-12 15:03 GMT+08:00 Stuti Awasthi <stutiawas...@hcl.com>: > Hi All, > > Im wanted to try Survival Analysis on Spark 1.6. I am successfully able to > run the AFT example provided. Now I tried to train the model with Ovarian > data which is standard data comes with Survival library in R. > > Default Column Name : *Futime,fustat,age,resid_ds,rx,ecog_ps* > > > > Here are the steps I have done : > > · Loaded the data from csv to dataframe labeled as > > *val* ovarian_data = sqlContext.read > > .format("com.databricks.spark.csv") > > .option("header", "true") // Use first line of all files as header > > .option("inferSchema", "true") // Automatically infer data types > > .load("Ovarian.csv").toDF("label", "censor", "age", "resid_ds", "rx", > "ecog_ps") > > · Utilize the VectorAssembler() to create features from "age", > "resid_ds", "rx", "ecog_ps" like > > *val* assembler = *new* VectorAssembler() > > .setInputCols(Array("age", "resid_ds", "rx", "ecog_ps")) > > .setOutputCol("features") > > > > · Then I create a new dataframe with only 3 colums as : > > *val* training = finalDf.select("label", "censor", "features") > > > > · Finally Im passing it to AFT > > *val* model = aft.fit(training) > > > > Im getting the error as : > > java.lang.AssertionError: *assertion failed: AFTAggregator loss sum is > infinity. Error for unknown reason.* > > at scala.Predef$.assert(*Predef.scala:179*) > > at org.apache.spark.ml.regression.AFTAggregator.add( > *AFTSurvivalRegression.scala:480*) > > at org.apache.spark.ml.regression.AFTCostFun$$anonfun$5.apply( > *AFTSurvivalRegression.scala:522*) > > at org.apache.spark.ml.regression.AFTCostFun$$anonfun$5.apply( > *AFTSurvivalRegression.scala:521*) > > at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply( > *TraversableOnce.scala:144*) > > at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply( > *TraversableOnce.scala:144*) > > at scala.collection.Iterator$class.foreach(*Iterator.scala:727*) > > > > I have tried to print the schema : > > ()root > > |-- label: double (nullable = true) > > |-- censor: double (nullable = true) > > |-- features: vector (nullable = true) > > > > Sample data training looks like > > [59.0,1.0,[72.3315,2.0,1.0,1.0]] > > [115.0,1.0,[74.4932,2.0,1.0,1.0]] > > [156.0,1.0,[66.4658,2.0,1.0,2.0]] > > [421.0,0.0,[53.3644,2.0,2.0,1.0]] > > [431.0,1.0,[50.3397,2.0,1.0,1.0]] > > > > Im not able to understand about the error, as if I use same data and > create the denseVector as given in Sample example of AFT, then code works > completely fine. But I would like to read the data from CSV file and then > proceed. > > > > Please suggest > > > > Thanks &Regards > > Stuti Awasthi > > > > > > ::DISCLAIMER:: > > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > > ---------------------------------------------------------------------------------------------------------------------------------------------------- >