No, you can join on bytearrays. What can't be done is have pig thinking you are joining on bytearrays when you are actually using strings under the covers -- that's what causes the error you are seeing.
On Wed, Apr 11, 2012 at 7:09 AM, shan s <[email protected]> wrote: > Hi Dmitriy > It works after explicit casting to chararray. > So does it mean a bytearray field can't be used in JOIN or is there more to > it? > How to explain this behaviour ? > > Thanks! > On Wed, Apr 11, 2012 at 8:45 AM, shan s <[email protected]> wrote: > >> When I load my data I defined all fields to be chararray in the schema. I >> can afford to treat everything as chararray. >> >> rid cold be chararray. ( but no real expectations from my side, it's a >> guid from coming from db) >> AA and BB do come from UDF, UDF does some string processing and >> returns substrings as tuples. >> Also when I tried to convert the rid to chararray in A3, I get an error, >> "can't convert to chararray." without further explanation. >> >> Thank You.... >> On Wed, Apr 11, 2012 at 4:09 AM, Dmitriy Ryaboy <[email protected]>wrote: >> >>> What type do you expect rid to be? >>> Where did AA and BB come from? >>> >>> D >>> >>> On Tue, Apr 10, 2012 at 12:03 PM, shan s <[email protected]> wrote: >>> > I am currently getting “Type mismatch in key from map: expected >>> > org.apache.pig.impl.io.NullableBytesWritable, recieved >>> > org.apache.pig.impl.io.NullableText “ >>> > >>> > >>> > I looked up the PIG-919 and related comments, but could not understand >>> the >>> > reason or the workaround for this problem. >>> > >>> > Could you please kindly explain this further? >>> > >>> > >>> > >>> > I am getting this even before my GROUP, when I do my 3 way JOIN. >>> > >>> > >>> > >>> > A1 = JOIN AA BY rid, BB BY rid; >>> > >>> > A2 = JOIN A1 BY BB::cid, CC by cid; >>> > >>> > DESCRIBE A2; >>> > >>> > A3 = FOREACH A2 GENERATE FLATTEN((TOTUPLE(BB::rid))); >>> > >>> > DESCRIBE A3; >>> > >>> > DUMP A3; >>> > >>> > >>> > >>> > >>> > >>> > DESCRIBE looks like below. >>> > >>> > >>> > >>> > A2: {A1::AA::rid: bytearray,A1::AA::roname: bytearray,A1::AA::asid: >>> > bytearray,A1::AA::asname: bytearray,A1::BB::rid: >>> bytearray,A1::BB::roname: >>> > bytearray,A1::BB::cid: bytearray,A1::BB::csname: bytearray,CC::cid: >>> > bytearray,CC::csname: bytearray,CC::chid: bytearray,CC::chname: >>> bytearray} >>> > >>> > A3: {org.apache.pig.builtin.totuple_A1::BB::rid_3::A1::BB::rid: >>> bytearray} >>> > >>> > >>> > >>> > >>> > >>> > If map is a problem, I tried to convert it to tuple (For A3) above, >>> but it >>> > still does not work, in fact A3 still describes it as map (with a {}, I >>> > guess) Why is that? >>> > >>> > >>> > >>> > Appreciate your help! Thanks!! >>> >> >>
