Spark Committers: Please advise the way forward for this issue.
Thanks for your support.
Regards,
Venkat
From: Venkat, Ankam
Sent: Thursday, January 22, 2015 9:34 AM
To: 'Frank Austin Nothaft'; 'user@spark.apache.org'
Cc: 'Nick Allen'
Subject: RE: How to 'Pip
"user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: How to 'Pipe' Binary Data in Apache Spark
I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe
that binary data to an external pro
d as new enhancement Jira request?
Nick: What's your take on this?
Regards,
Venkat Ankam
From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Wednesday, January 21, 2015 12:30 PM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org<mailto:user@spark.apache.org>
Subj
Regards,
> Venkat Ankam
>
>
> From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
> Sent: Wednesday, January 21, 2015 12:30 PM
> To: Venkat, Ankam
> Cc: Nick Allen; user@spark.apache.org
> Subject: Re: How to 'Pipe' Binary Data in Apache Spark
>
>
: What's your take on this?
Regards,
Venkat Ankam
From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Wednesday, January 21, 2015 12:30 PM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
Hi Venkat/Nick,
The
, '-t'
> >>> 'wav', '-', '-n', 'stats'])).collect() ß Does not work. Tried different
> >>> options.
> AttributeError: 'function' object has no attribute 'read'
>
> Any suggestions?
>
> Regar
7;/usr/local/bin/sox', '-t'
>>> 'wav', '-', '-n', 'stats'])).collect() <-- Does not work. Tried different
>>> options.
AttributeError: 'function' object has no attribute 'read'
Any suggestions?
Regards,
V
I just wanted to reiterate the solution for the benefit of the community.
The problem is not from my use of 'pipe', but that 'textFile' cannot be
used to read in binary data. (Doh) There are a couple options to move
forward.
1. Implement a custom 'InputFormat' that understands the binary input da
Per your last comment, it appears I need something like this:
https://github.com/RIPE-NCC/hadoop-pcap
Thanks a ton. That get me oriented in the right direction.
On Fri, Jan 16, 2015 at 10:20 AM, Sean Owen wrote:
> Well it looks like you're reading some kind of binary file as text.
> That isn
Well it looks like you're reading some kind of binary file as text.
That isn't going to work, in Spark or elsewhere, as binary data is not
even necessarily the valid encoding of a string. There are no line
breaks to delimit lines and thus elements of the RDD.
Your input has some record structure (
I have an RDD containing binary data. I would like to use 'RDD.pipe' to
pipe that binary data to an external program that will translate it to
string/text data. Unfortunately, it seems that Spark is mangling the binary
data before it gets passed to the external program.
This code is representative
11 matches
Mail list logo