I am new to Apache Pig and I am trying to debug my Java UDF using PigServer
API.

Data file format

    component NIL     2015-07-12      18:58:55.74     E
xxxxx.xxx.xxxx.xxx 17      0xd3    biz     MESSAGE 00.00
 
Key1=value1&key2=11234567890123456789&key3=value3&key4=value4&key5=value5&key6=value6&key7=value7&key8=value8&key9=value9&key10=value10&key11=value11

Java UDF

    public Map exec(Tuple input) throws IOException {
    Map<String, String> map = null;

    if (map == null){
    map = new HashMap<String, String>();
    List<String> list = new ArrayList<String>();
            int posi = 0, endi;
            String data = (String)input.get(0);

            System.out.println("data"+data);
            if (data == null){
             return map;
            }
            while ((endi = data.indexOf('&', posi)) >= 0) {
             list.add(data.substring(posi, endi));
             posi = endi + 1;
            }
            if(endi == -1){
             list.add(data.substring(posi, data.length()));
            }

            for (String split: list) {
             int end = split.indexOf('=');
             if(end != -1) { //added to avoid the non name:value pair which
is case above
             map.put(split.substring(0, end), split.substring(end + 1,
split.length()));
             }
            }
    }


    return map;


    }



Pig Script

    /* register jar for this script  */
    REGISTER C:/path/to/file/pig.jar;
    REGISTER C:/path/to/file/pig/TestUDF.jar;
    -- Define function for use.
    define pigUDF com.mycomp.udf.test.TestUDF();



    A = LOAD 'C:/path/to/file/data.txt' using PigStorage('\t')
AS(pool:chararray,Nil:chararray,entry_date:chararray,time:chararray,E:chararray,machinename:chararray,number1:int,number2:chararray,cal_type:chararray,cal_name:chararray,number3:int,number4:float,dataMap:chararray);

    B = FILTER A BY (cal_name MATCHES 'MESSAGE') ;

    C= FOREACH A GENERATE pigUDF(dataMap);
    Dump  C;

Java App code

    public static void main(String[] args) throws IOException {
            PigServer pig = new PigServer(ExecType.LOCAL);
            pig.registerScript("C:/path/to/pigscript/PigScript.pig");

    }


Console Message :


    15/07/13 17:46:40 WARN mapReduceLayer.PigHadoopLogger:
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject:
Attempt to access field which was not found in the input
        INFO executionengine.LocalPigLauncher: Successfully stored result
in: "file:/tmp/temp1859620259/tmp516095447"
        15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: Records
written : 19
        15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: Bytes
written : 0
        15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: 100%
complete!
        15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: Success!!

I have few queries :
1.I am getting some non readable junk data in the output file  .why ?
2.How to debug my java UDF I tried with system.out.println its not
displaying anything in console.
3.I need to know why I am getting this "Attempt to access field which was
not found in the input" though I am running pig in local mode
For more info [refer here][1]


  [1]: https://issues.apache.org/jira/browse/PIG-784


Am I missing anything ?

Reply via email to