We discovered this in the code for translating thrift to pig tuples,
so TOBAG is not uniquely at fault. Although if both TOBAG and our code
maks the same mistake that might indicate that we are hitting the same
change in behavior.

On Sat, Jun 18, 2011 at 9:32 AM, Kim Vogt <k...@simplegeo.com> wrote:
> This jira appears to be relevant
> https://issues.apache.org/jira/browse/PIG-449
>
> -Kim
>
> On Sat, Jun 18, 2011 at 9:25 AM, Kim Vogt <k...@simplegeo.com> wrote:
>
>> To clarify, I get no errors when I run with the old TOBAG, the one in the
>> gist without the outputSchema.
>>
>> -Kim
>>
>>
>> On Sat, Jun 18, 2011 at 9:23 AM, Kim Vogt <k...@simplegeo.com> wrote:
>>
>>> I think it has something to do with the TOBAG udf (
>>> https://gist.github.com/1033242). When I run:
>>>
>>> a = load 'x.txt' as (a:long, b:long);
>>> -- dump a;
>>>
>>> x = foreach a generate a, TOBAG(a, b) as abag;
>>> y = foreach x generate TOTUPLE(a, abag) as atuple;
>>> z = foreach y generate atuple.abag;
>>> describe z;
>>> dump z;
>>>
>>> I get no errors.
>>>
>>> grunt> describe z
>>>
>>> z: {abag: {null}}
>>>
>>> grunt> dump z
>>>
>>> 2011-06-18 09:18:34,106 [main] INFO
>>>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
>>> paths to process : 1
>>>
>>> ({})
>>>
>>> ({})
>>>
>>> ({})
>>>
>>> Hope that helps!
>>>
>>> -Kim
>>>
>>> On Sat, Jun 18, 2011 at 8:05 AM, Gianmarco <gianmarco....@gmail.com>wrote:
>>>
>>>> Hi Dmitriy,
>>>>
>>>> Unfortunately I don't have a solution but I can report something related.
>>>> I don't know exaclty why there was a change between 0.8 and 0.9, but I
>>>> also
>>>> noticed that now complex schemas are handled differently.
>>>> For example:
>>>>
>>>> In 0.8 I would use:
>>>> raw = LOAD '$input' AS (username:chararray,
>>>> topics:bag{t(topic:chararray)},
>>>> links:bag{t(link:chararray)});
>>>>
>>>> But in 0.9 this doesn't work and I need to stript that extra 't' from the
>>>> bags.
>>>> raw = LOAD '$input' AS (username:chararray,
>>>> topics:bag{(topic:chararray)},
>>>> links:bag{(link:chararray)});
>>>>
>>>> I assume this is the same problem and the change actually was introduced
>>>> in
>>>> 0.8.1
>>>>
>>>> Cheers,
>>>> --
>>>> Gianmarco De Francisci Morales
>>>>
>>>>
>>>> On Sat, Jun 18, 2011 at 16:42, Dmitriy Ryaboy <dvrya...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi folks,
>>>> > We've migrated to pig 0.8.1 and everything went pretty smoothly except
>>>> > for one oddity involving how we generate schemas for complex Thrift
>>>> > structures; namely, it seems like we get into trouble now when our
>>>> > tuple contains lists.
>>>> >
>>>> > The gory details are in
>>>> > https://github.com/kevinweil/elephant-bird/issues/60 but here's a
>>>> > summary. Any help, or pointers to relevant Jiras, would be much
>>>> > appreciated.
>>>> >
>>>> > in 8.1, reading that kind of structure seems to be broken altogether;
>>>> > this fails:
>>>> >
>>>> > a = load 'x.txt' as (a:long, b:long);
>>>> > -- dump a;
>>>> >
>>>> > x = foreach a generate a, TOBAG(a, b) as abag;
>>>> > y = foreach x generate TOTUPLE(a, abag) as atuple;
>>>> > z = foreach y generate atuple.abag;
>>>> > describe z;
>>>> > dump z;
>>>> >
>>>> > In trunk, the above snippet works, but loading a relation with a
>>>> > schema we generate for the following thrift definition does not work
>>>> > (0.6 and 0.8 don't complain):
>>>> > struct LogEvent {
>>>> > 1: optional EventDetails event_details
>>>> > }
>>>> >
>>>> > struct EventDetails {
>>>> > 1: optional list item_ids
>>>> > }
>>>> >
>>>> > The error:
>>>> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
>>>> > 2218: Invalid resource schema: bag schema must have tuple as its field
>>>> > at
>>>> >
>>>> org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:213)
>>>> > at
>>>> >
>>>> org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1881)
>>>> > at
>>>> >
>>>> org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1871)
>>>> > at
>>>> >
>>>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
>>>> >
>>>> > And some of Raghu's comments:
>>>> > There is some stripping going on in pig trunk. Added following log
>>>> > line to Schema.java (line 1986 in 0.8 and 1880 in trunk) :
>>>> > log.info("XXX : bag schema : " + rfs + " inner : " + innerFs);
>>>> >
>>>> > With same EB jar:
>>>> >
>>>> > On trunk :
>>>> > bag schema : item_ids:{item_ids_tuple:long} inner : item_ids_tuple:
>>>> long
>>>> > followed by error shown above.
>>>> >
>>>> > On 0.8 :
>>>> > bag schema : item_ids:{t:(item_ids_tuple:long)} inner : t:
>>>> > tuple({item_ids_tuple: long})
>>>> > bag schema : item_names:{t:(item_names_tuple:chararray)} inner : t:
>>>> > tuple({item_names_tuple: chararray})
>>>> > bag schema : tokens:{t:(tokens_tuple:chararray)} inner : t:
>>>> > tuple({tokens_tuple: chararray})
>>>> >
>>>> > The tuple wrapping gets stripped in Pig 10.
>>>> >
>>>> > For trunk, adding an extra Tuple wrapper fixes it. ie, a bag of longs
>>>> > looks like :
>>>> >
>>>> > new Schema(
>>>> >      new FieldSchema(  "bag",
>>>> >           new Schema (
>>>> >                new FieldSchema(  "t",  // Extra layer
>>>> >                        new Schema (    // Extra layer
>>>> >                               new FieldSchema( "bag_tuple", null,
>>>> > DataType.LONG )
>>>> >                        ), DataType.TUPLE  // Extra layer
>>>> >                 )
>>>> >         ), DataType.BAG
>>>> >  )
>>>> > But this does not seem compatible with pig 0.8.
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to