Hi, Thank You All..
Here is my requirement, I have a dataframe which contains list of rows retrieved from oracle table. I need to iterate dataframe and fetch each record and call a common function by passing few parameters. Issue I am facing is : I am not able to call common function JavaRDD<Person> personRDD = person_dataframe.toJavaRDD().map(new Function<Row, Person>() { @Override public Person call(Row row) throws Exception{ Person person = new Person(); person.setId(row.getDecimal(0).longValue()); person.setName(row.getString(1)); personLst.add(person); return person; } }); personRDD.foreach(new VoidFunction<Person>() { private static final long serialVersionUID = 1111111111111123456L; @Override public void call(Person person) throws Exception { System.out.println(person.getId()); Here I tried to call common function ************ } }); I am able to print data in foreach loop, however if I tried to call common function it gives me below error Error Message : org.apache.spark.SparkException: Task not serializable I kindly request you to share some idea(sample code / link to refer) on how to call a common function/Interace method by passing values in each record of the dataframe. Regards, Sunitha On Tue, Dec 19, 2017 at 1:20 PM, Weichen Xu <weichen...@databricks.com> wrote: > Hi Sunitha, > > In the mapper function, you cannot update outer variables such as > `personLst.add(person)`, > this won't work so that's the reason you got an empty list. > > You can use `rdd.collect()` to get a local list of `Person` objects > first, then you can safely iterate on the local list and do any update you > want. > > Thanks. > > On Tue, Dec 19, 2017 at 2:16 PM, Sunitha Chennareddy < > chennareddysuni...@gmail.com> wrote: > >> Hi Deepak, >> >> I am able to map row to person class, issue is I want to to call another >> method. >> I tried converting to list and its not working with out using collect. >> >> Regards >> Sunitha >> On Tuesday, December 19, 2017, Deepak Sharma <deepakmc...@gmail.com> >> wrote: >> >>> I am not sure about java but in scala it would be something like >>> df.rdd.map{ x => MyClass(x.getString(0),.....)} >>> >>> HTH >>> >>> --Deepak >>> >>> On Dec 19, 2017 09:25, "Sunitha Chennareddy" <chennareddysunitha@.com >>> <chennareddysuni...@gmail.com>> wrote: >>> >>> Hi All, >>> >>> I am new to Spark, I want to convert DataFrame to List<JavaClass> with >>> out using collect(). >>> >>> Main requirement is I need to iterate through the rows of dataframe and >>> call another function by passing column value of each row (person.getId()) >>> >>> Here is the snippet I have tried, Kindly help me to resolve the issue, >>> personLst is returning 0: >>> >>> List<Person> personLst= new ArrayList<Person>(); >>> JavaRDD<Person> personRDD = person_dataframe.toJavaRDD().map(new >>> Function<Row, Person>() { >>> public Person call(Row row) throws Exception{ >>> Person person = new Person(); >>> person.setId(row.getDecimal(0).longValue()); >>> person.setName(row.getString(1)); >>> >>> personLst.add(person); >>> // here I tried to call another function but control never passed >>> return person; >>> } >>> }); >>> logger.info("personLst size =="+personLst.size()); >>> logger.info("personRDD count ==="+personRDD.count()); >>> >>> //output is >>> personLst size == 0 >>> personRDD count === 3 >>> >>> >>> >