I'm hoping to have a sample sometime next week. Ram
On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) < [email protected]> wrote: > Thank you so much ram, for your advice , Option (a) would be ideal for my > requirement. > > > > Do you have sample usage for partitioning with individual configuration > set ups different partitions? > > > > Regards, > > Surya Vamshi > > > > *From:* Munagala Ramanath [mailto:[email protected]] > *Sent:* 2016, May, 25 12:11 PM > *To:* [email protected] > *Subject:* Re: Multiple directories > > > > You have 2 options: (a) AbstractFileInputOperator (b) > FileSplitter/BlockReader > > > > For (a), each partition (i.e. replica or the operator) can scan only a > single directory, so if you have 100 > > directories, you can simply start with 100 partitions; since each > partition is scanning its own directory > > you don't need to worry about which files the lines came from. This > approach however needs a custom > > definePartition() implementation in your subclass to assign the > appropriate directory and XML parsing > > config file to each partition; it also needs adequate cluster resources to > be able to spin up the required > > number of partitions. > > > > For (b), there is some documentation in the Operators section at > http://docs.datatorrent.com/ including > > sample code. There operators support scanning multiple directories out of > the box but have more > > elaborate configuration options. Check this out and see if it works in > your use case. > > > > Ram > > > > On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) < > [email protected]> wrote: > > Hello Ram/Team, > > > > My requirement is to read input feeds from different locations on HDFS and > parse those files by reading XML configuration files (each input feed has > configuration file which defines the fields inside the input feeds). > > > > My approach : I would like to define a mapping file which contains > individual feed identifier, feed location , configuration file location. I > would like to read this mapping file at initial load within setup() method > and define my DirectoryScan.acceptFiles. Here my challenge is when I read > the files , I should parse the lines by reading the individual > configuration files. How do I know the line is from particular file , if I > know this I can read the corresponding configuration file before parsing > the line. > > > > Please let me know how do I handle this. > > > > Regards, > > Surya Vamshi > > > > *From:* Munagala Ramanath [mailto:[email protected]] > *Sent:* 2016, May, 24 5:49 PM > *To:* Mukkamula, Suryavamshivardhan (CWM-NR) > *Subject:* Multiple directories > > > > One way of addressing the issue is to use some sort of external tool (like > a script) to > > copy all the input files to a common directory (making sure that the file > names are > > unique to prevent one file from overwriting another) before the Apex > application starts. > > > > The Apex application then starts and processes files from this directory. > > > > If you set the partition count of the file input operator to N, it will > create N partitions and > > the files will be automatically distributed among the partitions. The > partitions will work > > in parallel. > > > > Ram > > _______________________________________________________________________ > > This [email] may be privileged and/or confidential, and the sender does > not waive any related rights and obligations. Any distribution, use or > copying of this [email] or the information it contains by other than an > intended recipient is unauthorized. If you received this [email] in error, > please advise the sender (by return [email] or otherwise) immediately. You > have consented to receive the attached electronically at the above-noted > address; please retain a copy of this confirmation for future reference. > > > > _______________________________________________________________________ > > This [email] may be privileged and/or confidential, and the sender does > not waive any related rights and obligations. Any distribution, use or > copying of this [email] or the information it contains by other than an > intended recipient is unauthorized. If you received this [email] in error, > please advise the sender (by return [email] or otherwise) immediately. You > have consented to receive the attached electronically at the above-noted > address; please retain a copy of this confirmation for future reference. > >
