More specifically, seeking to a known location in the uncompressed data. So not just seeking to “the nearest record boundary”, but seeking to “position 100000000 in the uncompressed data”. I can see that if the writer kept track of this information on the side it would be available; my question is more about the standard formats (e.g. LZO compression in SequenceFile) supporting this without additional work. John
From: Rahul Bhattacharjee [mailto:rahul.rec....@gmail.com] Sent: Friday, May 24, 2013 1:00 AM To: user@hadoop.apache.org Subject: Re: splittable vs seekable compressed formats Yeah , I think John meant seeking to record boundaries. Thanks, Rahul On Fri, May 24, 2013 at 12:22 PM, Harsh J <ha...@cloudera.com<mailto:ha...@cloudera.com>> wrote: SequenceFiles should be seekable provided you know/manage their sync points during writes I think. With LZO this may be non-trivial. On Thu, May 23, 2013 at 11:01 PM, John Lilley <john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>> wrote: > I’ve read about splittable compressed formats in Hadoop. Are any of these > formats also “seekable” (in other words, be able to seek to an absolute > location in the uncompressed data). > > John > > -- Harsh J