Pig treating map values as String causing ClassCastException in CONCAT -----------------------------------------------------------------------
Key: PIG-2045 URL: https://issues.apache.org/jira/browse/PIG-2045 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0, 0.9.0 Reporter: Vivek Padmanabhan I have the below script ; {code} register mymapudf.jar; a = load '4523893_1' as (f1); a1 = foreach a generate org.vivek.udfs.mToMapUDF(f1); a2 = foreach a1 generate mapout#'k1' as str1,mapout#'k3' as str2; b = load '4523893_2' as (f1,f2); c = join a2 by CONCAT(str1,str2) , b by CONCAT(f1,f2); dump c; {code} The udf looks like below; {code} public class mToMapUDF extends EvalFunc<Map> { @Override public Map<String, Object> exec(Tuple arg0) throws IOException { Map <String,Object> myMapTResult = new HashMap<String, Object>(); myMapTResult.put("k1", "SomeString"); myMapTResult.put("k3", "SomeOtherString"); return myMapTResult; } @Override public Schema outputSchema(Schema input) { // TODO Auto-generated method stub return new Schema(new Schema.FieldSchema("mapout",DataType.MAP)); } } {code} The script fails with exception ; java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataByteArray at org.apache.pig.builtin.CONCAT.exec(CONCAT.java:51) The values of the map output, ie str1 and str2, is autmomatically treated as String by Pig and this causes the ClassCast exception when it is used in subsequent udfs. Since there are no explicit casting done nor any types defined, Pig should treat the values as the default bytearray. This issue is also observed in 0.9 The workaround in this case is to cast explicitly to chararray all keys involved in join. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira