I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
[email protected]> wrote:

> Thank you so much ram, for your advice , Option (a) would be ideal for my
> requirement.
>
>
>
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:[email protected]]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* [email protected]
> *Subject:* Re: Multiple directories
>
>
>
> You have 2 options: (a) AbstractFileInputOperator (b)
> FileSplitter/BlockReader
>
>
>
> For (a), each partition (i.e. replica or the operator) can scan only a
> single directory, so if you have 100
>
> directories, you can simply start with 100 partitions; since each
> partition is scanning its own directory
>
> you don't need to worry about which files the lines came from. This
> approach however needs a custom
>
> definePartition() implementation in your subclass to assign the
> appropriate directory and XML parsing
>
> config file to each partition; it also needs adequate cluster resources to
> be able to spin up the required
>
> number of partitions.
>
>
>
> For (b), there is some documentation in the Operators section at
> http://docs.datatorrent.com/ including
>
> sample code. There operators support scanning multiple directories out of
> the box but have more
>
> elaborate configuration options. Check this out and see if it works in
> your use case.
>
>
>
> Ram
>
>
>
> On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hello Ram/Team,
>
>
>
> My requirement is to read input feeds from different locations on HDFS and
> parse those files by reading XML configuration files (each input feed has
> configuration file which defines the fields inside the input feeds).
>
>
>
> My approach : I would like to define a mapping file which contains
> individual feed identifier, feed location , configuration file location. I
> would like to read this mapping file at initial load within setup() method
> and define my DirectoryScan.acceptFiles. Here my challenge is when I read
> the files , I should parse the lines by reading the individual
> configuration files. How do I know the line is from particular file , if I
> know this I can read the corresponding configuration file before parsing
> the line.
>
>
>
> Please let me know how do I handle this.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:[email protected]]
> *Sent:* 2016, May, 24 5:49 PM
> *To:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Subject:* Multiple directories
>
>
>
> One way of addressing the issue is to use some sort of external tool (like
> a script) to
>
> copy all the input files to a common directory (making sure that the file
> names are
>
> unique to prevent one file from overwriting another) before the Apex
> application starts.
>
>
>
> The Apex application then starts and processes files from this directory.
>
>
>
> If you set the partition count of the file input operator to N, it will
> create N partitions and
>
> the files will be automatically distributed among the partitions. The
> partitions will work
>
> in parallel.
>
>
>
> Ram
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>
>
> _______________________________________________________________________
>
> This [email] may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this [email] or the information it contains by other than an
> intended recipient is unauthorized. If you received this [email] in error,
> please advise the sender (by return [email] or otherwise) immediately. You
> have consented to receive the attached electronically at the above-noted
> address; please retain a copy of this confirmation for future reference.
>
>

Reply via email to