[ https://issues.apache.org/jira/browse/PIG-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Corinne Chandel resolved PIG-1711. ---------------------------------- Resolution: Fixed Built In Functions doc updated. BinStorage section updated with new information. Patch will be submitted under Pig-1772. > Document BinStorage behaviour > ------------------------------ > > Key: PIG-1711 > URL: https://issues.apache.org/jira/browse/PIG-1711 > Project: Pig > Issue Type: Bug > Components: documentation > Affects Versions: 0.6.0, 0.7.0 > Reporter: Viraj Bhat > Assignee: Corinne Chandel > Fix For: 0.9.0 > > > We need to document some features of BinStorage that can cause indeterminate > results. > I have a Pig script of this type: > {code} > raw = load 'sampledata' using BinStorage() as (col1,col2, col3); > --filter out null columns > A = filter raw by col1#'bcookie' is not null; > B = foreach A generate col1#'bcookie' as reqcolumn; > describe B; > --B: {regcolumn: bytearray} > X = limit B 5; > dump X; > B = foreach A generate (chararray)col1#'bcookie' as convertedcol; > describe B; > --B: {convertedcol: chararray} > X = limit B 5; > dump X; > {code} > The first dump produces: > (36co9b55onr8s) > (36co9b55onr8s) > (36hilul5oo1q1) > (36hilul5oo1q1) > (36l4cj15ooa8a) > The second dump produces: > () > () > () > () > () > So we need to write correct documentation on why this happens. One good > explanation seems to be: > According to Alan: > BinStorage should not track data lineage. In the case where Pig is using > BinStorage (or whatever) for moving data between MR jobs then Pig can figure > out the correct cast function to use and apply it. For cases such as the one > here where users are storing data using BinStorage and then in a separate Pig > Latin script reading it (and thus loosing the type information) it is the > users responsibility to correctly cast the data before storing it in > BinStorage. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira