CatatlystTypeConverters.scala has all types of utility methods to convert from Scala to row and vice a versa.
On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan <rishabh...@gmail.com> wrote: > I had the same issue. I resolved it in Java, but I am pretty sure it would > work with scala too. Its kind of a gross hack. But what I did is say I had > a table in Mysql with 1000 columns > what is did is that I threw a jdbc query to extracted the schema of the > table. I stored that schema and wrote a map function to create StructFields > using structType and Row.Factory. Then I took that table loaded as a > dataFrame, event though it had a schema. I converted that data frame into > an RDD, this is when it lost the schema. Then performed something using > that RDD and then converted back that RDD with the structfield. > If your source is structured type then it would be better if you can load > it directly as a DF that way you can preserve the schema. However, in your > case you should do something like this > List<StructFrield> fields = new ArrayList<StructField> > for(keys in MAP) > fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true > )); > > StrructType schemaOfDataFrame = DataTypes.createStructType(conffields); > > sqlcontext.createDataFrame(rdd, schemaOfDataFrame); > > This is how I would do it to make it in Java, not sure about scala syntax. > Please tell me if that helped. > > On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein <fabian.boehnl...@gmail.com> > wrote: > > Hi all, > > is there a way to create a Spark SQL Row schema based on Scala data types > without creating a manual mapping? > > That's the only example I can find which doesn't require > spark.sql.types.DataType already as input, but it requires to define them > as Strings. > > * val struct = (new StructType)* .add("a", "int")* .add("b", "long")* > .add("c", "string") > > > > Specifically I have an RDD where each element is a Map of 100s of > variables with different data types which I want to transform to a DataFrame > where the keys should end up as the column names: > > Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> ....) > > > Is there a different possibility than building a mapping from the values' > .getClass to the Spark SQL DataTypes? > > > Thanks, > Fabian > > > > >