Looks like your udf expects numeric data but you are sending string type. Suggest to cast to numeric.
On Thu, 13 Apr 2017 at 7:03 pm, issues solution <issues.solut...@gmail.com> wrote: > Hi > I am newer in spark and i want ask you what wrang with checkpoint On > pyspark 1.6.0 > > i dont unertsand what happen after i try to use it under datframe : > dfTotaleNormalize24 = dfTotaleNormalize23.select([i if i not in > listrapcot else udf_Grappra(F.col(i)).alias(i) for i in > dfTotaleNormalize23.columns ]) > > dfTotaleNormalize24.cache() <- cache on memory > dfTotaleNormalize24.count <-matrialize dataframe( rdd too ??) > dfTotaleNormalize24.rdd.checkpoint() <- (cut DAG and save rdd not yet) > dfTotaleNormalize24.rdd.count() <--- matrialize in file > > but why i get the following error : > > java.lang.UnsupportedOperationException: Cannot evaluate expression: > PythonUDF#Grappra(input[410, StringType]) > > > thank's to explain all details and steps to save and check point > > Mydatframe it huge on with more than 5 Million rows and 1000 columns > > and udf befor are applied on more than 150 columns it replace ' ' by 0.0 > that all. > > regards > -- Best Regards, Ayan Guha