Are these mainly in csv format? Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 17 June 2016 at 20:38, Everett Anderson <ever...@nuna.com.invalid> wrote: > Hi, > > I have a system with files in a variety of non-standard input formats, > though they're generally flat text files. I'd like to dynamically create > DataFrames of string columns. > > What's the best way to go from a RDD<String> to a DataFrame of StringType > columns? > > My current plan is > > - Call map() on the RDD<String> with a function to split the String > into columns and call RowFactory.create() with the resulting array, > creating a RDD<Row> > - Construct a StructType schema using column names and StringType > - Call SQLContext.createDataFrame(RDD, schema) to create the result > > Does that make sense? > > I looked through the spark-csv package a little and noticed that it's > using baseRelationToDataFrame(), but BaseRelation looks like it might be a > restricted developer API. Anyone know if it's recommended for use? > > Thanks! > > - Everett > >