Hello,

In our current project we have an oozie workflow which has pig actions where we 
write entries to hive after data propagation. This requires hive properties to 
be defined in the workflow and currently we’re referencing a copy of hive’s 
configuration xml file through the <job-xml> element with all possible hive 
properties. So what our oozie workflow xml looks like is something like this: 
(pseudocode)
<workflow-app ...>
                ...
                <action ...>
                                <pig>
                                                <job-tracker/>
                                                <name-node/>
                                                
<job-xml>/workflows/.../hive-conf.xml</>
                                                <configuration>
                                                                <property/>
                                                                <property/>
                                                                <property/>
                                                </configuration>
                                                <script>/pigscript.pig</script>
                                                <argument/>
                                                <argument/>
                                                <argument/>
                                </pig>
                                <ok to/>
                                <error to/>
                </action>
                ...
</workflow-app>

The hive-conf.xml includes basically all of the possible hive properties 
starting with authentication properties ending with file footer inclusion etc. 
So the question would be that which configuration parameters does pig actually 
use to communicate with hive and which parameters should we include in our 
configuration element to make the workflow a bit cleaner?

Thanks in advance,
Marko

Reply via email to