Hi Ram, Please let me know if you need some more information about our use case.
Thanks & Regards, Surya Vamshi From: Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:[email protected]] Sent: 2016, June, 23 10:36 AM To: [email protected] Subject: RE: Reading Multiple Direcotries in sequence Hi Ram, In my case , I have 120 directories that I have to read per one batch job per a day. With your guidance I have successfully implemented the parallel partition approach with a single logical operator and it is working the way it is expected. I am creating the partition depending on the number sources from the properties file. If I give 250MB per each operator , I need around 12 containers of each 4GB RAM(Each container can handle 10 parallel operators) which comes around 50GB of RAM to process the batch. My concerns are, Please provide your suggestions, è Is this memory utilization on the cluster ok ? è If not I can sequentially run two applications with (60 directories per application) but I have to schedule the batch in different times, may be by using oozie or spring batch. What do you suggest? è As per your comments below , How do I make my partition wait for trigger from kafka (or) entry in Database, is it inside the definepartition? Do you have any sample code for the same. What I am currently doing to generate the partition is source property in the properties file for each directory. I am processing the each file differently and generating the output file in different directories. Regards, Surya Vamshi From: Munagala Ramanath [mailto:[email protected]] Sent: 2016, June, 23 9:54 AM To: [email protected]<mailto:[email protected]> Subject: Re: Reading Multiple Direcotries in sequence No, I don't have an example but several approaches are possible depending on the exact requirements, e.g.: 1. How large is the number of directories ? 2. Is the desired sequence a total order or a partial order (i.e. DAG, https://en.wikipedia.org/wiki/Partially_ordered_set) ? If the number of directories is small you can use one operator per directory and link them with ports in the desired sequence. Each operator sends a control tuple to the next when it wants the next one to start. Each operator waits for this trigger and emits tuples in the idle time handler, for example: public class DownStreamReceiver extends AbstractFileInputOperator implements Operator.IdleTimeHandler{ @Override public void handleIdleTime() { if(upstreamDoneReading){ // this is set to true only after receiving the trigger from 1st reader emitTuples(); } } } If the number is large, you can explore the earlier partitioned approach but have each partition look for a trigger from an external source like a Kafka queue or an entry in a DB to start processing. Ram On Thu, Jun 23, 2016 at 6:11 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <[email protected]<mailto:[email protected]>> wrote: Hi Ram, Do you have a sample DT application code for reading multiple directories in sequence ? Or through some light on how would I achieve that with AbstractFileInputOperator. Regards, Surya Vamshi _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference. Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future. _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference. Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future. _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference. Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future.
