The memory usage should be OK if your cluster can support it (i.e. other
apps are
not being starved of memory).

If you want to process the directories sequentially, there is no benefit to
partitioning,
so why not use just a single operator ? You'll need to write some code in
your operator
to change "filePath" when you transition from one directory to the next and
make
related changes.

When triggers the transition, i.e. when do you decide you're done with one
directory
and move on to the next ?

Ram

On Mon, Jun 27, 2016 at 4:05 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
[email protected]> wrote:

> Hi Ram,
>
>
>
> Please let me know if you need some more information about our use case.
>
>
>
> Thanks & Regards,
>
> Surya Vamshi
>
>
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> [email protected]]
> *Sent:* 2016, June, 23 10:36 AM
> *To:* [email protected]
> *Subject:* RE: Reading Multiple Direcotries in sequence
>
>
>
> Hi Ram,
>
>
>
> In my case , I have 120 directories that I have to read per one batch job
> per a day. With your guidance I have successfully implemented the parallel
> partition approach with a single logical operator and it is working the way
> it is expected. I am creating the partition depending on the number sources
> from the properties file.
>
>
>
> If I give 250MB per each operator , I need around 12 containers of each
> 4GB RAM(Each container can handle 10 parallel operators) which comes around
> 50GB of RAM to process the batch.
>
>
>
> My concerns are, Please provide your suggestions,
>
>
>
> è Is this memory utilization on the cluster ok ?
>
> è If not I can sequentially run two applications with (60 directories per
> application) but I have to schedule the batch in different times, may be by
> using oozie or spring batch. What do you suggest?
>
> è As per your comments below , How do I make my partition wait for
> trigger from kafka (or) entry in Database, is it inside the
> definepartition? Do you have any sample code for the same. What I am
> currently doing to generate the partition is source property in the
> properties file for each directory. I am processing the each file
> differently and generating the output file in different directories.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:[email protected]
> <[email protected]>]
> *Sent:* 2016, June, 23 9:54 AM
> *To:* [email protected]
> *Subject:* Re: Reading Multiple Direcotries in sequence
>
>
>
> No, I don't have an example but several approaches are possible depending
> on the
>
> exact requirements, e.g.:
>
> 1. How large is the number of directories ?
>
> 2. Is the desired sequence a total order or a partial order (i.e. DAG,
> https://en.wikipedia.org/wiki/Partially_ordered_set) ?
>
>
>
> If the number of directories is small you can use one operator per
> directory and link them with ports in the
>
> desired sequence. Each operator sends a control tuple to the next when it
> wants the next one to start.
>
> Each operator waits for this trigger and emits tuples in the idle time
> handler, for example:
>
>
>
> *public class DownStreamReceiver extends AbstractFileInputOperator
> implements Operator.IdleTimeHandler{*
>
> *  @Override*
>
> *  public void handleIdleTime()*
>
> *  {*
>
> *        if(upstreamDoneReading){ // this is set to true only after
> receiving the trigger from 1st reader*
>
> *         emitTuples();*
>
> *        }*
>
> *  }*
>
> *}*
>
>
>
> If the number is large, you can explore the earlier partitioned approach
> but have each partition look for a trigger
>
> from an external source like a Kafka queue or an entry in a DB to start
> processing.
>
>
>
> Ram
>
>
>
> On Thu, Jun 23, 2016 at 6:11 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hi Ram,
>
>
>
> Do you have a sample DT application code for reading multiple directories
> in sequence ?
>
>
>
> Or through some light on how would I achieve that with
> AbstractFileInputOperator.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> _______________________________________________________________________
>
> If you received this email in error, please advise the sender (by return
> email or otherwise) immediately. You have consented to receive the attached
> electronically at the above-noted email address; please retain a copy of
> this confirmation for future reference.
>
> Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur
> immédiatement, par retour de courriel ou par un autre moyen. Vous avez
> accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à
> l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de
> cette confirmation pour les fins de reference future.
>
>
>
> _______________________________________________________________________
>
> If you received this email in error, please advise the sender (by return
> email or otherwise) immediately. You have consented to receive the attached
> electronically at the above-noted email address; please retain a copy of
> this confirmation for future reference.
>
> Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur
> immédiatement, par retour de courriel ou par un autre moyen. Vous avez
> accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à
> l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de
> cette confirmation pour les fins de reference future.
>
> _______________________________________________________________________
>
> If you received this email in error, please advise the sender (by return
> email or otherwise) immediately. You have consented to receive the attached
> electronically at the above-noted email address; please retain a copy of
> this confirmation for future reference.
>
> Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur
> immédiatement, par retour de courriel ou par un autre moyen. Vous avez
> accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à
> l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de
> cette confirmation pour les fins de reference future.
>
>

Reply via email to