Hi,

Thank You All..

Here is my requirement, I have a dataframe which contains list of rows
retrieved from oracle table.
I need to iterate dataframe and fetch each record and call a common
function by passing few parameters.

Issue I am facing is : I am not able to call common function

JavaRDD<Person> personRDD = person_dataframe.toJavaRDD().map(new
Function<Row, Person>() {
  @Override
  public Person call(Row row)  throws Exception{
  Person person = new Person();
  person.setId(row.getDecimal(0).longValue());
  person.setName(row.getString(1));

personLst.add(person);
return person;
  }
});

personRDD.foreach(new VoidFunction<Person>() {
private static final long serialVersionUID = 1111111111111123456L;

@Override
public void call(Person person) throws Exception
{
  System.out.println(person.getId());
Here I tried to call common function ************
}
   });

I am able to print data in foreach loop, however if I tried to call common
function it gives me below error
Error Message :  org.apache.spark.SparkException: Task not serializable

I kindly request you to share some idea(sample code / link to refer) on how
to call a common function/Interace method by passing values in each record
of the dataframe.

Regards,
Sunitha


On Tue, Dec 19, 2017 at 1:20 PM, Weichen Xu <weichen...@databricks.com>
wrote:

> Hi Sunitha,
>
> In the mapper function, you cannot update outer variables such as 
> `personLst.add(person)`,
> this won't work so that's the reason you got an empty list.
>
> You can use `rdd.collect()` to get a local list of `Person` objects
> first, then you can safely iterate on the local list and do any update you
> want.
>
> Thanks.
>
> On Tue, Dec 19, 2017 at 2:16 PM, Sunitha Chennareddy <
> chennareddysuni...@gmail.com> wrote:
>
>> Hi Deepak,
>>
>> I am able to map row to person class, issue is I want to to call another
>> method.
>> I tried converting to list and its not working with out using collect.
>>
>> Regards
>> Sunitha
>> On Tuesday, December 19, 2017, Deepak Sharma <deepakmc...@gmail.com>
>> wrote:
>>
>>> I am not sure about java but in scala it would be something like
>>> df.rdd.map{ x => MyClass(x.getString(0),.....)}
>>>
>>> HTH
>>>
>>> --Deepak
>>>
>>> On Dec 19, 2017 09:25, "Sunitha Chennareddy" <chennareddysunitha@.com
>>> <chennareddysuni...@gmail.com>> wrote:
>>>
>>> Hi All,
>>>
>>> I am new to Spark, I want to convert DataFrame to List<JavaClass> with
>>> out using collect().
>>>
>>> Main requirement is I need to iterate through the rows of dataframe and
>>> call another function by passing column value of each row (person.getId())
>>>
>>> Here is the snippet I have tried, Kindly help me to resolve the issue,
>>> personLst is returning 0:
>>>
>>> List<Person> personLst= new ArrayList<Person>();
>>> JavaRDD<Person> personRDD = person_dataframe.toJavaRDD().map(new
>>> Function<Row, Person>() {
>>>   public Person call(Row row)  throws Exception{
>>>   Person person = new Person();
>>>   person.setId(row.getDecimal(0).longValue());
>>>   person.setName(row.getString(1));
>>>
>>> personLst.add(person);
>>> // here I tried to call another function but control never passed
>>>     return person;
>>>   }
>>> });
>>> logger.info("personLst size =="+personLst.size());
>>> logger.info("personRDD count ==="+personRDD.count());
>>>
>>> //output is
>>> personLst size == 0
>>> personRDD count === 3
>>>
>>>
>>>
>

Reply via email to