Hey,
I was looking at the Oozie code and I found this class/method that could be
changed to include Pig stats JSON to the action data so that it can be accessed
in subsequent workflow actions as ${wf:actionData('pig-node')['stats']}.
Something like this may have been planned or intended but it's not done
currently.
Class: org.apache.oozie.DagELFunctions.java
Method: setActionInfo(WorkflowInstance workflowInstance, WorkflowAction action)
Where it sets the action data value for "hadoopJobs", as follows:
if (action.getExternalChildIDs() != null) {
workflowInstance.setVar(action.getName() +
WorkflowInstance.NODE_VAR_SEPARATOR + ACTION_DATA,
HADOOP_JOBS_PREFIX + action.getExternalChildIDs());
}
The code could be changed to also set the value for "stats" as for example:
String STATS_PREFIX = "stats:";
if (action.getExternalChildIDs() != null || action.getStatus != null) {
String separator = "";
StringBuffer sb = new StringBuffer(100);
if (action.getExternalChildIDs() != null) {
sb.append(HADOOP_JOBS_PREFIX).append(action.getExternalChildIDs());
separator = "\n";
}
if (action.getStatus() != null) {
sb.append(separator).append(STATS_PREFIX).append(action.getStats());
}
workflowInstance.setVar(action.getName() +
WorkflowInstance.NODE_VAR_SEPARATOR + ACTION_DATA,
sb.toString());
}
This way it's not necessary to use the <capture-output/> and the pig action
would set the stats attribute (if the property
oozie.action.external.stats.write is present and equals to true).
Eduardo.
________________________________
From: Eduardo Afonso Ferreira <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Friday, September 14, 2012 11:18 AM
Subject: Pig action: capture output - PigStats JSON
Hey,
How about adding a functionality to the Pig action to return the PigStats JSON
using the <capture-output/> functionality?
The pig action could return that on an attribute (say: pig_stats) that can be
accessed in the workflow as as ${wf:actionData('pig-node')['pig_stats']}.
If something like this is implemented I will have exactly what I need because
I'll have another action on my workflow that will use that as a <param/>.
Let me know if this is possible or how do you recommend I could get the
PigStats to my action.
Here's an example of what I'm talking about:
--------------------------------------------
<workflow-app xmlns="uri:oozie:workflow:0.1" name="session_counts-wf">
<start to="pig-node"/>
<action name="pig-node">
<pig>
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<script>${SCRIPT}</script>
<capture-output />
</pig>
<ok to="stats-node"/>
<error to="fail"/>
</action>
<action name='stats-node'>
<java>
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<main-class>com.turner.util.CheckPigStats</main-class>
<arg>${wf:actionData('pig-node')['pig_stats']}</arg>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Pig failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
----------------------------------------------
Thanks for your help.
Eduardo.