Re: getWrappedSplit() is incorrectly returning the first split

Aniket Mokashi Mon, 09 Jan 2012 22:55:40 -0800

The change was added as part of PIG-1518. It has release notes-

"This change will not cause any backward compatibility issue except if a
loader implementation makes use of the PigSplit object passed through the
prepareToRead method where a rebuild of the loader might be necessary as
PigSplit's definition has been modified. However, currently we know of no
external use of the object.


This change also requires the loader to be stateless across the invocations
to the prepareToRead method. That is, the method should reset any internal
states that are not affected by the RecordReader argument.
Otherwise, this feature should be disabled.

It looks like returning 0th split was done deliberately. Comments?

Thanks,
Aniket

On Mon, Jan 9, 2012 at 9:10 PM, Alex Rovner <alexrov...@gmail.com> wrote:

> I have already created the patch and tested with some of my jobs. I ran
> into unit tests failure issues though as well. I can attach the patch to
> Jira tomorrow anyways to be applied once things are straightened out.
>
> Alex R
>
> On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <jcove...@gmail.com>
> wrote:
>
> > If it is affecting production jobs, I see no reason why we can't put the
> > fix into 0.9.2, though I sense that a vote will be coming soon for a
> 0.9.2
> > release, so a fix would have to come soon..the issues running the tests
> > brought up in Bill's thread will have to be fixed before we can, though.
> I
> > have a patch that's completely stopped because I can develop any new
> tests,
> > and so on.
> >
> > 2012/1/9 Prashant Kommireddi <prash1...@gmail.com>
> >
> > > Is this critical enough to make it back into 0.9.1?
> > >
> > > -Prashant
> > >
> > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <aniket...@gmail.com>
> > > wrote:
> > >
> > > > Thanks so much for finding this out.
> > > >
> > > > I was using
> > > >
> > > > @Override
> > > >
> > > > public void prepareToRead(@SuppressWarnings("rawtypes")
> > > > RecordReaderreader, PigSplit split)
> > > >
> > > >  throws IOException {
> > > >
> > > >  this.in = reader;
> > > >
> > > >  partValues =
> > > >
> > > >
> > >
> >
> ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues();
> > > >
> > > >
> > > > in my loader that behaves like hcatalog for delimited text in hive.
> > That
> > > > returns me same partvalues for all the values. I hacked it with
> > something
> > > > else. But, I think I must have hit this case. I will confirm. Thanks
> > > again
> > > > for reporting this.
> > > >
> > > > Thanks,
> > > >
> > > > Aniket
> > > >
> > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <da...@hortonworks.com>
> > > wrote:
> > > >
> > > > > Yes, please. Thanks!
> > > > >
> > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <alexrov...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Jira opened.
> > > > > >
> > > > > > I can attempt to submit a patch as this seems like a fairly
> > straight
> > > > > > forward fix.
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/PIG-2462
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Alex R
> > > > > >
> > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <
> da...@hortonworks.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sounds like a bug. I guess no one ever rely on specific split
> > info
> > > > > > before.
> > > > > > > Please open a Jira.
> > > > > > >
> > > > > > > Daniel
> > > > > > >
> > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <
> > alexrov...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Additionally it looks like PigRecordReader is not
> incrementing
> > > the
> > > > > > index
> > > > > > > in
> > > > > > > > the PigSplit when dealing with CombinedInputFormat thus the
> > index
> > > > > will
> > > > > > be
> > > > > > > > incorrect in either case.
> > > > > > > >
> > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <
> > > alexrov...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Ran into this today. Using trunk (0.11)
> > > > > > > > >
> > > > > > > > > If you are using a custom loader and are trying to get
> input
> > > > split
> > > > > > > > > information In prepareToRead(), getWrappedSplit() is
> > providing
> > > > the
> > > > > > fist
> > > > > > > > > split instead of current.
> > > > > > > > >
> > > > > > > > > Checking the code confirms the suspicion:
> > > > > > > > >
> > > > > > > > > PigSplit.java:
> > > > > > > > >
> > > > > > > > >     public InputSplit getWrappedSplit() {
> > > > > > > > >         return wrappedSplits[0];
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > > > Should be:
> > > > > > > > >     public InputSplit getWrappedSplit() {
> > > > > > > > >         return wrappedSplits[splitIndex];
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The side effect is that if you are trying to retrieve the
> > > current
> > > > > > split
> > > > > > > > > when pig is using CombinedInputFormat it incorrectly always
> > > > returns
> > > > > > the
> > > > > > > > > first file in the list instead of the current one that its
> > > > > reading. I
> > > > > > > > have
> > > > > > > > > also confirmed it by outputing a log statement in the
> > > > > > prepareToRead():
> > > > > > > > >
> > > > > > > > >     @Override
> > > > > > > > >     public void prepareToRead(@SuppressWarnings("rawtypes")
> > > > > > > RecordReader
> > > > > > > > > reader, PigSplit split)
> > > > > > > > >             throws IOException {
> > > > > > > > >         String path =
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString();
> > > > > > > > >         partitions = getPartitions(table, path);
> > > > > > > > >         log.info("Preparing to read: " + path);
> > > > > > > > >         this.reader = reader;
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > > > 2012-01-06 16:27:24,165 INFO
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
> > > > > > > > Current split being processed
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06
> > > > > > > > 16:27:24,180 INFO
> > com.hadoop.compression.lzo.GPLNativeCodeLoader:
> > > > > > Loaded
> > > > > > > > native gpl library2012-01-06 16:27:24,183 INFO
> > > > > > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded &
> > > > > initialized
> > > > > > > > native-lzo library [hadoop-lzo rev
> > > > > > > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06
> > 16:27:24,189
> > > > INFO
> > > > > > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader:
> Preparing
> > > to
> > > > > > read:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06
> > > > > > > > 16:27:28,053 INFO
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
> > > > > > > > Current split being processed
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06
> > > > > > > > 16:27:28,056 INFO
> > > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader:
> > > > > > > > Preparing to read:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Notice how the pig is correctly reporting the split but my
> > > "info"
> > > > > > > > > statement is always reporting the first input split vs
> > current.
> > > > > > > > >
> > > > > > > > > Bug? Jira? Patch?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Alex R
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > "...:::Aniket:::... Quetzalco@tl"
> > > >
> > >
> >
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Re: getWrappedSplit() is incorrectly returning the first split

Reply via email to