Joshua Juen created PIG-5272:
--------------------------------

             Summary: BagToString Output Schema
                 Key: PIG-5272
                 URL: https://issues.apache.org/jira/browse/PIG-5272
             Project: Pig
          Issue Type: Improvement
            Reporter: Joshua Juen
            Priority: Minor


The output schema from BagToTuple is nonsensical causing problems using the 
tuple later in the same script. 

For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
schema: ( data:chararray )

But, this makes no sense since if the above bag contains: {data1, data2, data3} 
entries, the output tuple from BagToTuple will be:
(data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
declared output schema from the UDF.

Unfortunately, the schema of the tuple cannot be known during the initial 
validation phase. Thus, I believe the output schema from the UDF should be 
modified to be type tuple without the number of fields being fixed to the 
number of columns in the input bag. 

Under the current way, the elements in the tuple cannot be accessed in the 
script after calling BagToTuple without getting an incompatible type error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to