Umesh: Please take a look at the classes under: sql/core/src/main/scala/org/apache/spark/sql/parquet
FYI On Mon, Aug 10, 2015 at 10:35 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > Hi Bo thanks much let me explain please see the following code > > JavaPairRDD<String,PortableDataStream> pairRdd = > javaSparkContext.binaryFiles("/hdfs/path/to/binfile"); > JavaRDD<PortableDataStream> javardd = pairRdd.values(); > > DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd, > PortableDataStream.class); > binDataFrame.show(); //shows just one row with above file path > /hdfs/path/to/binfile > > I want binary data in DataFrame from above file so that I can directly do > analytics on it. My data is binary so I cant use StructType > with primitive data types rigth since everything is binary/byte. My custom > data format in binary is same as Parquet I did not find any good example > where/how parquet is read into DataFrame. Please guide. > > > > > > On Sun, Aug 9, 2015 at 11:52 PM, bo yang <bobyan...@gmail.com> wrote: > >> Well, my post uses raw text json file to show how to create data frame >> with a custom data schema. The key idea is to show the flexibility to deal >> with any format of data by using your own schema. Sorry if I did not make >> you fully understand. >> >> Anyway, let us know once you figure out your problem. >> >> >> >> >> On Sun, Aug 9, 2015 at 11:10 AM, Umesh Kacha <umesh.ka...@gmail.com> >> wrote: >> >>> Hi Bo I know how to create a DataFrame my question is how to create a >>> DataFrame for binary files and in your blog it is raw text json files >>> please read my question properly thanks. >>> >>> On Sun, Aug 9, 2015 at 11:21 PM, bo yang <bobyan...@gmail.com> wrote: >>> >>>> You can create your own data schema (StructType in spark), and use >>>> following method to create data frame with your own data schema: >>>> >>>> sqlContext.createDataFrame(yourRDD, structType); >>>> >>>> I wrote a post on how to do it. You can also get the sample code there: >>>> >>>> Light-Weight Self-Service Data Query through Spark SQL: >>>> >>>> https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang >>>> >>>> Take a look and feel free to let me know for any question. >>>> >>>> Best, >>>> Bo >>>> >>>> >>>> >>>> On Sat, Aug 8, 2015 at 1:42 PM, unk1102 <umesh.ka...@gmail.com> wrote: >>>> >>>>> Hi how do we create DataFrame from a binary file stored in HDFS? I was >>>>> thinking to use >>>>> >>>>> JavaPairRDD<String,PortableDataStream> pairRdd = >>>>> javaSparkContext.binaryFiles("/hdfs/path/to/binfile"); >>>>> JavaRDD<PortableDataStream> javardd = pairRdd.values(); >>>>> >>>>> I can see that PortableDataStream has method called toArray which can >>>>> convert into byte array I was thinking if I have JavaRDD<byte[]> can I >>>>> call >>>>> the following and get DataFrame >>>>> >>>>> DataFrame binDataFrame = >>>>> sqlContext.createDataFrame(javaBinRdd,Byte.class); >>>>> >>>>> Please guide I am new to Spark. I have my own custom format which is >>>>> binary >>>>> format and I was thinking if I can convert my custom format into >>>>> DataFrame >>>>> using binary operations then I dont need to create my own custom Hadoop >>>>> format am I on right track? Will reading binary data into DataFrame >>>>> scale? >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >