Yes, that's what I need. Thanks.
P.
On 02/05/2017 12:17 PM, Koert Kuipers wrote:
since there is no key to group by and assemble records i would suggest
to write this in RDD land and then convert to data frame. you can use
sc.wholeTextFiles to process text files and create a state machine
O
since there is no key to group by and assemble records i would suggest to
write this in RDD land and then convert to data frame. you can use
sc.wholeTextFiles to process text files and create a state machine
On Feb 4, 2017 16:25, "Paul Tremblay" wrote:
I am using pyspark 2.1 and am wondering how
I am using pyspark 2.1 and am wondering how to convert a flat file, with
one record per row, into a columnar format.
Here is an example of the data:
u'WARC/1.0',
u'WARC-Type: warcinfo',
u'WARC-Date: 2016-12-08T13:00:23Z',
u'WARC-Record-ID: ',
u'Content-Length: 344',
u'Content-Type: applicati