Hi, Mark!

Which kind of error message do you get?

The simplest way to convert RDD to DF is just importing implicits and use
toDF

import spark.implicits._
val df = rdd.toDF

:-)

2016년 11월 29일 (화) 오전 1:26, Mark Mikolajczak <m...@flayranalytics.co.uk>님이
작성:

>
>
> Hi All,
>
> Hoping you can help:
>
>
> I have created an RDD from a NOSQL database and I want to convert the RDD
> to a data frame. I have tried many options but all result in errors.
>
>     val df = sc.couchbaseQuery(test).map(_.value).collect().foreach(println)
>
>
> {"accountStatus":"AccountOpen","custId":"140034"}
> {"accountStatus":"AccountOpen","custId":"140385"}
> {"accountStatus":"AccountClosed","subId":"10795","custId":"139698","subStatus":"Active"}
> {"accountStatus":"AccountClosed","subId":"11364","custId":"140925","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","subId":"10413","custId":"138842","subStatus":"Active"}
> {"accountStatus":"AccountOpen","subId":"10414","custId":"138842","subStatus":"Active"}
> {"accountStatus":"AccountClosed","subId":"11314","custId":"140720","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","custId":"139166"}
> {"accountStatus":"AccountClosed","subId":"10735","custId":"139558","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","custId":"139575"}
> df: Unit = ()
>
> I have tried adding .toDF() to the end of my code and also creating a
> schema and using createDataFrame but receive errors. Whats the best
> approach to converting the RDD to Dataframe?
>
> import org.apache.spark.sql.types._
>
> // The schema is encoded in a string
> val schemaString = "accountStatus subId custId subStatus"
>
> // Generate the schema based on the string of schema
> val fields = schemaString.split(" ")
>   .map(fieldName => StructField(fieldName, StringType, nullable = true))
> val schema = StructType(fields)
>
> //
>
> val peopleDF = spark.createDataFrame(df,schema)
>
>
> --
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

Reply via email to