You can create your own data schema (StructType in spark), and use following method to create data frame with your own data schema:
sqlContext.createDataFrame(yourRDD, structType); I wrote a post on how to do it. You can also get the sample code there: Light-Weight Self-Service Data Query through Spark SQL: https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang Take a look and feel free to let me know for any question. Best, Bo On Sat, Aug 8, 2015 at 1:42 PM, unk1102 <umesh.ka...@gmail.com> wrote: > Hi how do we create DataFrame from a binary file stored in HDFS? I was > thinking to use > > JavaPairRDD<String,PortableDataStream> pairRdd = > javaSparkContext.binaryFiles("/hdfs/path/to/binfile"); > JavaRDD<PortableDataStream> javardd = pairRdd.values(); > > I can see that PortableDataStream has method called toArray which can > convert into byte array I was thinking if I have JavaRDD<byte[]> can I call > the following and get DataFrame > > DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); > > Please guide I am new to Spark. I have my own custom format which is binary > format and I was thinking if I can convert my custom format into DataFrame > using binary operations then I dont need to create my own custom Hadoop > format am I on right track? Will reading binary data into DataFrame scale? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >