Re: How to create DataFrame from a binary file?

bo yang Sun, 09 Aug 2015 11:24:06 -0700

Well, my post uses raw text json file to show how to create data frame with
a custom data schema. The key idea is to show the flexibility to deal with
any format of data by using your own schema. Sorry if I did not make you
fully understand.


Anyway, let us know once you figure out your problem.




On Sun, Aug 9, 2015 at 11:10 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote:

> Hi Bo I know how to create a DataFrame my question is how to create a
> DataFrame for binary files and in your blog it is raw text json files
> please read my question properly thanks.
>
> On Sun, Aug 9, 2015 at 11:21 PM, bo yang <bobyan...@gmail.com> wrote:
>
>> You can create your own data schema (StructType in spark), and use
>> following method to create data frame with your own data schema:
>>
>> sqlContext.createDataFrame(yourRDD, structType);
>>
>> I wrote a post on how to do it. You can also get the sample code there:
>>
>> Light-Weight Self-Service Data Query through Spark SQL:
>>
>> https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang
>>
>> Take a look and feel free to  let me know for any question.
>>
>> Best,
>> Bo
>>
>>
>>
>> On Sat, Aug 8, 2015 at 1:42 PM, unk1102 <umesh.ka...@gmail.com> wrote:
>>
>>> Hi how do we create DataFrame from a binary file stored in HDFS? I was
>>> thinking to use
>>>
>>> JavaPairRDD<String,PortableDataStream> pairRdd =
>>> javaSparkContext.binaryFiles("/hdfs/path/to/binfile");
>>> JavaRDD<PortableDataStream> javardd = pairRdd.values();
>>>
>>> I can see that PortableDataStream has method called toArray which can
>>> convert into byte array I was thinking if I have JavaRDD<byte[]> can I
>>> call
>>> the following and get DataFrame
>>>
>>> DataFrame binDataFrame =
>>> sqlContext.createDataFrame(javaBinRdd,Byte.class);
>>>
>>> Please guide I am new to Spark. I have my own custom format which is
>>> binary
>>> format and I was thinking if I can convert my custom format into
>>> DataFrame
>>> using binary operations then I dont need to create my own custom Hadoop
>>> format am I on right track? Will reading binary data into DataFrame
>>> scale?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: How to create DataFrame from a binary file?

Reply via email to