Hi, DataTypes is a Scala Array which corresponds in Java to Java Array. So you must use a String[]. However since RowFactory.create expects an array of Object as Columns content, it should be:
public Row call(String line){ return RowFactory.create(new String[][]{line.split(" ")}); } More details in this Stackoverflow question <http://stackoverflow.com/questions/43411492/createdataframe-throws-exception-when-pass-javardd-that-contains-arraytype-col/43585039#43585039> . Hope this works for you, Cheers 2017-04-23 18:13 GMT+02:00 Chen, Mingrui <mingr...@mail.smu.edu>: > Hello everyone! > > > I am a new Spark learner and trying to do a task seems very simple. I want > to read a text file, save the content to JavaRDD and convert it to > Dataframe, so I can use it for Word2Vec Model in the future. The code looks > pretty simple but I cannot make it work: > > > SparkSession spark = SparkSession.builder().appName("Word2Vec"). > getOrCreate(); > JavaRDD<String> lines = spark.sparkContext().textFile("input.txt", > 10).toJavaRDD(); > JavaRDD<Row> rows = lines.map(new Function<String, Row>(){ > public Row call(String line){ > return RowFactory.create(Arrays.asList(line.split(" "))); > } > }); > StructType schema = new StructType(new StructField[] { > new StructField("text", new ArrayType(DataTypes.StringType, true), false, > Metadata.empty()) > }); > Dataset<Row> input = spark.createDataFrame(rows, schema); > input.show(3); > > It throws an exception at input.show(3): > > > Caused by: java.lang.ClassCastException: cannot assign instance of > scala.collection.immutable.List$SerializationProxy to field > org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type > scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD > > Seems it has problem converting the JavaRDD<Row> to Dataframe. However I > cannot figure out what mistake I make here and the exception message is > hard to understand. Anyone can help? Thanks! > > -- [image: photo] Radhwane Chebaane Distributed systems engineer, Mindlytix Mail: radhw...@mindlytix.com <radhw...@mindlytix.com> Mobile: +33 695 588 906 <+33+695+588+906> <https://mail.google.com/mail/u/0/#> Skype: rad.cheb <https://mail.google.com/mail/u/0/#> LinkedIn <https://fr.linkedin.com/in/radhwane-chebaane-483b3a7b> <https://mail.google.com/mail/u/0/#>