Hi, all. I find that it's really confuse. I can use Vectors.parse to create a DataFrame contains Vector type.
scala> val dataVec = Seq((0, Vectors.parse("[1,3,5]")), (1, Vectors.parse("[2,4,6]"))).toDF dataVec: org.apache.spark.sql.DataFrame = [_1: int, _2: vector] But using map to convert String to Vector throws an error: scala> val dataStr = Seq((0, "[1,3,5]"), (1, "[2,4,6]")).toDF dataStr: org.apache.spark.sql.DataFrame = [_1: int, _2: string] scala> dataStr.map(row => Vectors.parse(row.getString(1))) <console>:30: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. dataStr.map(row => Vectors.parse(row.getString(1))) Dose anyone can help me, thanks very much! On Tue, Sep 6, 2016 at 9:58 PM, Peter Figliozzi <pete.figlio...@gmail.com> wrote: > Hi Yan, I think you'll have to map the features column to a new numerical > features column. > > Here's one way to do the individual transform: > > scala> val x = "[1, 2, 3, 4, 5]" > x: String = [1, 2, 3, 4, 5] > > scala> val y:Array[Int] = x slice(1, x.length - 1) replace(",", "") > split(" ") map(_.toInt) > y: Array[Int] = Array(1, 2, 3, 4, 5) > > If you don't know about the Scala command line, just type "scala" in a > terminal window. It's a good place to try things out. > > You can make a function out of this transformation and apply it to your > features column to make a new column. Then add this with > Dataset.withColumn. > > See here > <http://stackoverflow.com/questions/35227568/applying-function-to-spark-dataframe-column> > on how to apply a function to a Column to make a new column. > > On Tue, Sep 6, 2016 at 1:56 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > >> Hi, >> I have a csv file like: >> uid mid features label >> 123 5231 [0, 1, 3, ...] True >> >> Both "features" and "label" columns are used for GBTClassifier. >> >> However, when I read the file: >> Dataset<Row> samples = sparkSession.read().csv(file); >> The type of samples.select("features") is String. >> >> My question is: >> How to map samples.select("features") to Vector or any appropriate type, >> so I can use it to train like: >> GBTClassifier gbdt = new GBTClassifier() >> .setLabelCol("label") >> .setFeaturesCol("features") >> .setMaxIter(2) >> .setMaxDepth(7); >> >> Thanks. >> > >