Re: multi-line elements

Christopher Nguyen Tue, 24 Dec 2013 21:38:00 -0800

Phillip, if there are easily detectable line groups you might define your
own InputFormat. Alternatively you can consider using mapPartitions() to
get access to the entire data partition instead of row-at-a-time. You'd
still have to worry about what happens at the partition boundaries. A third
approach is indeed to pre-process with an appropriate mapper/reducer.


Sent while mobile. Pls excuse typos etc.
I have a file that consists of multi-line records.  Is it possible to read
in multi-line records with a method such as SparkContext.newAPIHadoopFile?
 Or do I need to pre-process the data so that all the data for one element
is in a single line?

Thanks,
Philip

Re: multi-line elements

Reply via email to