I am new to Apache Pig and I am trying to debug my Java UDF using PigServer
API.
Data file format
component NIL 2015-07-12 18:58:55.74 E
xxxxx.xxx.xxxx.xxx 17 0xd3 biz MESSAGE 00.00
Key1=value1&key2=11234567890123456789&key3=value3&key4=value4&key5=value5&key6=value6&key7=value7&key8=value8&key9=value9&key10=value10&key11=value11
Java UDF
public Map exec(Tuple input) throws IOException {
Map<String, String> map = null;
if (map == null){
map = new HashMap<String, String>();
List<String> list = new ArrayList<String>();
int posi = 0, endi;
String data = (String)input.get(0);
System.out.println("data"+data);
if (data == null){
return map;
}
while ((endi = data.indexOf('&', posi)) >= 0) {
list.add(data.substring(posi, endi));
posi = endi + 1;
}
if(endi == -1){
list.add(data.substring(posi, data.length()));
}
for (String split: list) {
int end = split.indexOf('=');
if(end != -1) { //added to avoid the non name:value pair which
is case above
map.put(split.substring(0, end), split.substring(end + 1,
split.length()));
}
}
}
return map;
}
Pig Script
/* register jar for this script */
REGISTER C:/path/to/file/pig.jar;
REGISTER C:/path/to/file/pig/TestUDF.jar;
-- Define function for use.
define pigUDF com.mycomp.udf.test.TestUDF();
A = LOAD 'C:/path/to/file/data.txt' using PigStorage('\t')
AS(pool:chararray,Nil:chararray,entry_date:chararray,time:chararray,E:chararray,machinename:chararray,number1:int,number2:chararray,cal_type:chararray,cal_name:chararray,number3:int,number4:float,dataMap:chararray);
B = FILTER A BY (cal_name MATCHES 'MESSAGE') ;
C= FOREACH A GENERATE pigUDF(dataMap);
Dump C;
Java App code
public static void main(String[] args) throws IOException {
PigServer pig = new PigServer(ExecType.LOCAL);
pig.registerScript("C:/path/to/pigscript/PigScript.pig");
}
Console Message :
15/07/13 17:46:40 WARN mapReduceLayer.PigHadoopLogger:
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject:
Attempt to access field which was not found in the input
INFO executionengine.LocalPigLauncher: Successfully stored result
in: "file:/tmp/temp1859620259/tmp516095447"
15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: Records
written : 19
15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: Bytes
written : 0
15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: 100%
complete!
15/07/13 17:46:40 INFO executionengine.LocalPigLauncher: Success!!
I have few queries :
1.I am getting some non readable junk data in the output file .why ?
2.How to debug my java UDF I tried with system.out.println its not
displaying anything in console.
3.I need to know why I am getting this "Attempt to access field which was
not found in the input" though I am running pig in local mode
For more info [refer here][1]
[1]: https://issues.apache.org/jira/browse/PIG-784
Am I missing anything ?