Hi,

I just was trying to get started using Pig and get familiar with it but I
am getting problems while reading the XML.

My XML looks like the following (of course, its much bigger, I just added
first entries):

<cn:bulkCmConfigDataFile xmlns:cn="details-CONFIG" xmlns:xt="nrmBase"
xmlns:en="CLL-NB">
<cn:fileHeader fileFormatVersion="2.0.0" senderName="senderName" vendorName
="vendorName"/>
<cn:configData>
<en:ManagementNode xmlns:en="CLL-NB">
<en:neGroup>Group_1</en:neGroup>
<en:neVersion>2.1.0</en:neVersion>
<en:neId>100</en:neId>
<en:neName>TK0005</en:neName>
<en:neIp>192.168.0.2</en:neIp>
</en:ManagementNode>
<en:ManagementNode xmlns:en="CLL-NB">
<en:neGroup>Group_1</en:neGroup>
<en:neVersion>2.1.0</en:neVersion>
<en:neId>101</en:neId>
<en:neName>TK0002</en:neName>
<en:neIp>192.168.0.3</en:neIp>
</en:ManagementNode>
</cn:configData>
<cn:fileFooter dateTime="2013-12-20T03:40:15+00:00"/>
</cn:bulkCmConfigDataFile>

And the Pig script I am trying to use is the following:


set pig.splitCombination false;
set tez.grouping.min-size 5242880;
set tez.grouping.max-size 5242880;

register '/usr/lib/tez/tez-0.7.0/tez-tfile-parser-0.7.0.jar';

DEFINE getDetails(raw) RETURNS void {
        details = FOREACH raw GENERATE configData;
        distinctDetails = DISTINCT details;
        STORE distinctDetails INTO '$DETAILS' USING PigStorage(',');;
}


rmf $NODE_DETAILS
rawLogs = load '/user/hduser/test/test01/ManagementNode.xml' using
org.apache.tez.tools.TFileLoader() as (configData:chararray, key:chararray,
line:chararray);
raw = FOREACH rawLogs GENERATE ManagementNode,key,line;

getDetails(raw);
exec;

However, I am getting the following error:

ERROR 2998: Unhandled internal error. null

java.lang.StackOverflowError
        at org.apache.tez.tools.TFileLoader.hashCode(TFileLoader.java:148)
        at java.util.Arrays.hashCode(Arrays.java:3140)
...

Could it be because of the XML file?

Thanks.


J. Reyes.

Reply via email to