I have a coord job definition as below in the end of this mail.

The job is a daily job, which basically processes some logs. That said, the
logs are sometimes delayed, and i'd want to start the process only after
the logs are present.
Hence, I'm trying to add an input-event which should be satisfied before
the process starts.

Typically, the logs come in with a delay of 4 days(yea..i know it's
crazy..but let's say so)., I have my input event instance set to
${coord:current(-4). This works well.

That said, my initial date of the dataset processed is a little behind
(1/26/2013)
And, lets say i fire this coord job on 1/30/2013).

To catch up for all the processing not done, the coordinator spins up jobs
for
1/26/2013
1/27/2013
....
1/30/2013

but all with creation date of 1/30/2013

Now, when these jobs run, the 1st one runs successfully, the input
validation ${coord:current(-4) succeeds, (log for 1/26/2013) is available.
the next one however again succeed's the input validation
${coord:current(-4) but the job fails, the input con logs for 1/27/2013 is
not available yet.

I see that this because that the coord:current() value is computed using
the job creation time (and not the nominal time).

In a day to day job spin off situation this is ok, and would work well. But
trying to schedule work which should've been done..and to catch up..the
dates are off,

Is there a way to solve this?? Or is it just that I have to run the jobs
separately catch up, and then schedule for future runs?

<coordinator-app name="sauron-coord" frequency="${coord:days(1)}"
                    start="2013-01-26T00:00Z" end="2015-01-02T08:00Z"
                    timezone="America/Los_Angeles"
                    xmlns="uri:oozie:coordinator:0.1">
    <datasets>
      <dataset name="inputDir" frequency="${coord:days(1)}"
initial-instance="2013-01-26T00:00Z" timezone="America/Los_Angeles">

 
<uri-template>${nameNode}/shared/app_logs/${YEAR}/${MONTH}/${DAY}</uri-template>
      </dataset>
   </datasets>
   <input-events>
      <data-in name="coordInput" dataset="inputDir">
          <instance>${coord:current(-4)}</instance>
      </data-in>
   </input-events>
      <action>
        <workflow>

<app-path>${nameNode}/user/${coord:user()}/${sauronRoot}</app-path>
          <configuration>
            <property>
              <name>wfInput</name>

<value>${nameNode}/shared/app_logs/${coord:formatTime(coord:nominalTime(),
'yyyy/MM/dd')}/*/{na*,eu*,ap*}</value>
              </property>
              <property>
                <name>wfOutput</name>
                <value>/${coord:formatTime(coord:nominalTime(),
'yyyy/MM/dd')}</value>
              </property>
         </configuration>
       </workflow>
      </action>
   </coordinator-app>


Thank you,
-- 
-Praveen

Reply via email to