At this point I'm -1 on this.  I don't want to break existing output formats or 
store functions.  And I don't see that much value here.  You can accomplish the 
same thing by putting the logic in a static method of PigTextOutputFormat and 
letting other users use it.  Also, the cost of an extra copy of the output is 
bad.  We don't want to slow down storing data.

Alan.

On Jul 22, 2011, at 12:24 PM, Raghu Angadi wrote:

> attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> 
> Only drawback is extra copies required to make a Text().
> 
> 
> 
> On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com> wrote:
> 
>> I agree tuple -> text conversion better be in StoreFunc. User may have
>> better chance to reuse OutputFormat.
>> 
>> For backward compatibility, the signature of StoreFunc.getOutputFormat
>> returns a generic OutputFormat object, this is fine. However, existing
>> StoreFunc use PigOutputFormat need to change.
> 
> 
> you mean existing classes that override PigStorage.getOutputFormat() and not
> PigStorage.putNext()?
> Yes, they would be affected.. but fixing them is very simple, they just need
> to extend putNext().
> As such there is no contract regd getOutputFormat() for us to break :)
> 
> Raghu.
> 
>> I don't know how much impact
>> that will be, but need to be careful. We need to make clear announcement
>> and
>> document it as incompatible change if we do so.
>> 
>> Daniel
>> 
>> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <rang...@apache.org> wrote:
>> 
>>> expectation from PigStorage.getInputFormat()  is that it is a
>>> InputFormat<Writable, Text>, and PigStorage handles converting Text to
>>> Tuple.
>>> This is very useful and easy for users to use some other input format.
>>> 
>>> But the same is not true for PigStorage().getOutputFormat().. Here it
>>> expects OutputFormat<Writable, Tuple>. So the output format needs to
>>> convert
>>> Tuple to Text().
>>> 
>>> Not sure if this is intentional or not. I can submit a patch to move
>> Tuple
>>> handling into PigStorage. Then PigTextOutputFormat would be as thin as
>>> PigTextInputFormat.
>>> 
>> 

Reply via email to