Hey,

I was looking at the Oozie code and I found this class/method that could be 
changed to include Pig stats JSON to the action data so that it can be accessed 
in subsequent workflow actions as ${wf:actionData('pig-node')['stats']}. 
Something like this may have been planned or intended but it's not done 
currently.

Class: org.apache.oozie.DagELFunctions.java
Method: setActionInfo(WorkflowInstance workflowInstance, WorkflowAction action)

Where it sets the action data value for "hadoopJobs", as follows:

        if (action.getExternalChildIDs() != null) {
            workflowInstance.setVar(action.getName() + 
WorkflowInstance.NODE_VAR_SEPARATOR + ACTION_DATA,
                    HADOOP_JOBS_PREFIX + action.getExternalChildIDs());
        }

The code could be changed to also set the value for "stats" as for example:

        String STATS_PREFIX = "stats:";
        if (action.getExternalChildIDs() != null || action.getStatus != null) {
            String separator = "";
            StringBuffer sb = new StringBuffer(100);
            if (action.getExternalChildIDs() != null) {
                
sb.append(HADOOP_JOBS_PREFIX).append(action.getExternalChildIDs());
                separator = "\n";
            }
            if (action.getStatus() != null) {
                
sb.append(separator).append(STATS_PREFIX).append(action.getStats());
            }
            workflowInstance.setVar(action.getName() + 
WorkflowInstance.NODE_VAR_SEPARATOR + ACTION_DATA,
                sb.toString());
        }



This way it's not necessary to use the <capture-output/> and the pig action 
would set the stats attribute (if the property 
oozie.action.external.stats.write is present and equals to true).


Eduardo.



________________________________
 From: Eduardo Afonso Ferreira <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, September 14, 2012 11:18 AM
Subject: Pig action: capture output - PigStats JSON
 
Hey,

How about adding a functionality to the Pig action to return the PigStats JSON 
using the <capture-output/> functionality?
The pig action could return that on an attribute (say: pig_stats) that can be 
accessed in the workflow as as ${wf:actionData('pig-node')['pig_stats']}.
If something like this is implemented I will have exactly what I need because 
I'll have another action on my workflow that will use that as a <param/>.

Let me know if this is possible or how do you recommend I could get the 
PigStats to my action.

Here's an example of what I'm talking about:

--------------------------------------------
<workflow-app xmlns="uri:oozie:workflow:0.1" name="session_counts-wf">
    <start to="pig-node"/>
    <action name="pig-node">
        <pig>
            <job-tracker>${JOB_TRACKER}</job-tracker>
            <name-node>${NAME_NODE}</name-node>
            <script>${SCRIPT}</script>
            <capture-output />
        </pig>
        <ok to="stats-node"/>
        <error to="fail"/>
    </action>
    <action name='stats-node'>
        <java>
            <job-tracker>${JOB_TRACKER}</job-tracker>
            <name-node>${NAME_NODE}</name-node>
            <main-class>com.turner.util.CheckPigStats</main-class>
            <arg>${wf:actionData('pig-node')['pig_stats']}</arg>
        </java>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Pig failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>
----------------------------------------------

Thanks for your help.
Eduardo.

Reply via email to