Well, my post uses raw text json file to show how to create data frame with a custom data schema. The key idea is to show the flexibility to deal with any format of data by using your own schema. Sorry if I did not make you fully understand.
Anyway, let us know once you figure out your problem. On Sun, Aug 9, 2015 at 11:10 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > Hi Bo I know how to create a DataFrame my question is how to create a > DataFrame for binary files and in your blog it is raw text json files > please read my question properly thanks. > > On Sun, Aug 9, 2015 at 11:21 PM, bo yang <bobyan...@gmail.com> wrote: > >> You can create your own data schema (StructType in spark), and use >> following method to create data frame with your own data schema: >> >> sqlContext.createDataFrame(yourRDD, structType); >> >> I wrote a post on how to do it. You can also get the sample code there: >> >> Light-Weight Self-Service Data Query through Spark SQL: >> >> https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang >> >> Take a look and feel free to let me know for any question. >> >> Best, >> Bo >> >> >> >> On Sat, Aug 8, 2015 at 1:42 PM, unk1102 <umesh.ka...@gmail.com> wrote: >> >>> Hi how do we create DataFrame from a binary file stored in HDFS? I was >>> thinking to use >>> >>> JavaPairRDD<String,PortableDataStream> pairRdd = >>> javaSparkContext.binaryFiles("/hdfs/path/to/binfile"); >>> JavaRDD<PortableDataStream> javardd = pairRdd.values(); >>> >>> I can see that PortableDataStream has method called toArray which can >>> convert into byte array I was thinking if I have JavaRDD<byte[]> can I >>> call >>> the following and get DataFrame >>> >>> DataFrame binDataFrame = >>> sqlContext.createDataFrame(javaBinRdd,Byte.class); >>> >>> Please guide I am new to Spark. I have my own custom format which is >>> binary >>> format and I was thinking if I can convert my custom format into >>> DataFrame >>> using binary operations then I dont need to create my own custom Hadoop >>> format am I on right track? Will reading binary data into DataFrame >>> scale? >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >