hi,all
    I have a txt file ,and I want to process it as dataframe :

    data like this :
       name1,30
       name2,18

    val schemaString = "name age year"
    val xMap=new scala.collection.mutable.HashMap[String,DataType]()
    xMap.put("name", StringType)
    xMap.put("age", IntegerType)
    xMap.put("year", IntegerType)
    
    val fields = schemaString.split(" ").map(fieldName => 
StructField(fieldName, xMap.get(fieldName).get, nullable = true))
    val schema = StructType(fields)
    
    val peopleRDD = spark.sparkContext.textFile("/sourcedata/test/test*")
    //spark.read.schema(schema).text("/sourcedata/test/test*")
    
    val rowRDD = peopleRDD.map(_.split(",")).map(attributes => 
Row(attributes(0),attributes(1))

    // Apply the schema to the RDD
    val peopleDF = spark.createDataFrame(rowRDD, schema)  
 
    but when I write it to table or show it I will got error:


   Caused by: java.lang.RuntimeException: Error while encoding: 
java.lang.RuntimeException: java.lang.String is not a valid external type for 
schema of int
if (assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row 
object).isNullAt) null else staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, 
validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true], top level row object), 0, name), StringType), 
true) AS name#1
+- if (assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row 
object).isNullAt) null else staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, 
validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true], top level row object), 0, name), StringType), 
true)

   if I change my code it will work:
   val rowRDD = peopleRDD.map(_.split(",")).map(attributes => 
Row(attributes(0),attributes(1).toInt)
   but this is not a good idea .

2017-01-12


lk_spark 

Reply via email to