More specifically, seeking to a known location in the uncompressed data.  So 
not just seeking to “the nearest record boundary”, but seeking to “position 
100000000 in the uncompressed data”.  I can see that if the writer kept track 
of this information on the side it would be available; my question is more 
about the standard formats (e.g. LZO compression in SequenceFile) supporting 
this without additional work.
John

From: Rahul Bhattacharjee [mailto:rahul.rec....@gmail.com]
Sent: Friday, May 24, 2013 1:00 AM
To: user@hadoop.apache.org
Subject: Re: splittable vs seekable compressed formats

Yeah , I think John meant seeking to record boundaries.
Thanks,
Rahul

On Fri, May 24, 2013 at 12:22 PM, Harsh J 
<ha...@cloudera.com<mailto:ha...@cloudera.com>> wrote:
SequenceFiles should be seekable provided you know/manage their sync
points during writes I think. With LZO this may be non-trivial.

On Thu, May 23, 2013 at 11:01 PM, John Lilley 
<john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>> wrote:
> I’ve read about splittable compressed formats in Hadoop.  Are any of these
> formats also “seekable” (in other words, be able to seek to an absolute
> location in the uncompressed data).
>
> John
>
>


--
Harsh J

Reply via email to