Hi Stuti,

This is a bug of AFTSurvivalRegression, we did not handle "lossSum ==
infinity" properly.
I have open https://issues.apache.org/jira/browse/SPARK-13322 to track this
issue and will send a PR.
Thanks for reporting this issue.

Yanbo

2016-02-12 15:03 GMT+08:00 Stuti Awasthi <stutiawas...@hcl.com>:

> Hi All,
>
> Im wanted to try Survival Analysis on Spark 1.6. I am successfully able to
> run the AFT example provided. Now I tried to train the model with Ovarian
> data which is standard data comes with Survival library in R.
>
> Default Column Name :  *Futime,fustat,age,resid_ds,rx,ecog_ps*
>
>
>
> Here are the steps I have done :
>
> ·         Loaded the data from csv to dataframe labeled as
>
> *val* ovarian_data = sqlContext.read
>
>       .format("com.databricks.spark.csv")
>
>       .option("header", "true") // Use first line of all files as header
>
>       .option("inferSchema", "true") // Automatically infer data types
>
>       .load("Ovarian.csv").toDF("label", "censor", "age", "resid_ds", "rx",
> "ecog_ps")
>
> ·         Utilize the VectorAssembler() to create features from "age",
> "resid_ds", "rx", "ecog_ps" like
>
> *val* assembler = *new* VectorAssembler()
>
> .setInputCols(Array("age", "resid_ds", "rx", "ecog_ps"))
>
> .setOutputCol("features")
>
>
>
> ·         Then I create a new dataframe with only 3 colums as :
>
> *val* training = finalDf.select("label", "censor", "features")
>
>
>
> ·         Finally Im passing it to AFT
>
> *val* model = aft.fit(training)
>
>
>
> Im getting the error as :
>
> java.lang.AssertionError: *assertion failed: AFTAggregator loss sum is
> infinity. Error for unknown reason.*
>
>        at scala.Predef$.assert(*Predef.scala:179*)
>
>        at org.apache.spark.ml.regression.AFTAggregator.add(
> *AFTSurvivalRegression.scala:480*)
>
>        at org.apache.spark.ml.regression.AFTCostFun$$anonfun$5.apply(
> *AFTSurvivalRegression.scala:522*)
>
>        at org.apache.spark.ml.regression.AFTCostFun$$anonfun$5.apply(
> *AFTSurvivalRegression.scala:521*)
>
>        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(
> *TraversableOnce.scala:144*)
>
>        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(
> *TraversableOnce.scala:144*)
>
>        at scala.collection.Iterator$class.foreach(*Iterator.scala:727*)
>
>
>
> I have tried to print the schema :
>
> ()root
>
> |-- label: double (nullable = true)
>
> |-- censor: double (nullable = true)
>
> |-- features: vector (nullable = true)
>
>
>
> Sample data training looks like
>
> [59.0,1.0,[72.3315,2.0,1.0,1.0]]
>
> [115.0,1.0,[74.4932,2.0,1.0,1.0]]
>
> [156.0,1.0,[66.4658,2.0,1.0,2.0]]
>
> [421.0,0.0,[53.3644,2.0,2.0,1.0]]
>
> [431.0,1.0,[50.3397,2.0,1.0,1.0]]
>
>
>
> Im not able to understand about the error, as if I use same data and
> create the denseVector as given in Sample example of AFT, then code works
> completely fine. But I would like to read the data from CSV file and then
> proceed.
>
>
>
> Please suggest
>
>
>
> Thanks &Regards
>
> Stuti Awasthi
>
>
>
>
>
> ::DISCLAIMER::
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>

Reply via email to