Hi Ram,

Please let me know if you need some more information about our use case.

Thanks & Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) 
[mailto:[email protected]]
Sent: 2016, June, 23 10:36 AM
To: [email protected]
Subject: RE: Reading Multiple Direcotries in sequence

Hi Ram,

In my case , I have 120 directories that I have to read per one batch job per a 
day. With your guidance I have successfully implemented the parallel partition 
approach with a single logical operator and it is working the way it is 
expected. I am creating the partition depending on the number sources from the 
properties file.

If I give 250MB per each operator , I need around 12 containers of each 4GB 
RAM(Each container can handle 10 parallel operators) which comes around 50GB of 
RAM to process the batch.

My concerns are, Please provide your suggestions,


è Is this memory utilization on the cluster ok ?

è If not I can sequentially run two applications with (60 directories per 
application) but I have to schedule the batch in different times, may be by 
using oozie or spring batch. What do you suggest?

è As per your comments below , How do I make my partition wait for trigger from 
kafka (or) entry in Database, is it inside the definepartition? Do you have any 
sample code for the same. What I am currently doing to generate the partition 
is source property in the properties file for each directory. I am processing 
the each file differently and generating the output file in different 
directories.

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:[email protected]]
Sent: 2016, June, 23 9:54 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Reading Multiple Direcotries in sequence

No, I don't have an example but several approaches are possible depending on the
exact requirements, e.g.:
1. How large is the number of directories ?
2. Is the desired sequence a total order or a partial order (i.e. DAG, 
https://en.wikipedia.org/wiki/Partially_ordered_set) ?

If the number of directories is small you can use one operator per directory 
and link them with ports in the
desired sequence. Each operator sends a control tuple to the next when it wants 
the next one to start.
Each operator waits for this trigger and emits tuples in the idle time handler, 
for example:

public class DownStreamReceiver extends AbstractFileInputOperator implements 
Operator.IdleTimeHandler{
  @Override
  public void handleIdleTime()
  {
        if(upstreamDoneReading){ // this is set to true only after receiving 
the trigger from 1st reader
         emitTuples();
        }
  }
}

If the number is large, you can explore the earlier partitioned approach but 
have each partition look for a trigger
from an external source like a Kafka queue or an entry in a DB to start 
processing.

Ram

On Thu, Jun 23, 2016 at 6:11 AM, Mukkamula, Suryavamshivardhan (CWM-NR) 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi Ram,

Do you have a sample DT application code for reading multiple directories in 
sequence ?

Or through some light on how would I achieve that with 
AbstractFileInputOperator.

Regards,
Surya Vamshi


_______________________________________________________________________

If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.


_______________________________________________________________________

If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.
_______________________________________________________________________
If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.  

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.

Reply via email to