Re: Multiple directories

2016-07-12 Thread ganesh borate
shivardhan (CWM-NR)" < suryavamshivardhan.mukkam...@rbc.com> wrote: > Hi, > > Can you please let me know, How would I add multiple directories to an > Operator which extends ‘AbstractFileInputOperator’? > > I would like to read from multiple directories by a single opera

Re: Multiple directories

2016-07-08 Thread Priyanka Gugale
, Priyanka Gugale wrote: > Hi, > > Take a look at TimeBasedDirectoryScanner in FileSplitterInput, this > scanner accepts list of files/directories to scan. Also it accepts regex to > filter on file names. I think you can pick ides on how to scan multiple > directories from there. &g

Re: Multiple directories

2016-07-07 Thread Priyanka Gugale
Hi, Take a look at TimeBasedDirectoryScanner in FileSplitterInput, this scanner accepts list of files/directories to scan. Also it accepts regex to filter on file names. I think you can pick ides on how to scan multiple directories from there. -Priyanka On Thu, Jul 7, 2016 at 6:59 PM, Mukkamula

RE: Multiple directories

2016-07-07 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Hi Yunhan, This example I am already using for reading the data from multiple directories in parallel. Hear each directory is given to an operator in parallel. My requirement is I would like add multiple directories to a single operator. Regards, Surya Vamshi From: Yunhan Wang [mailto:yun

Re: Multiple directories

2016-07-06 Thread Yunhan Wang
> Hi, > > Can you please let me know, How would I add multiple directories to an > Operator which extends ‘AbstractFileInputOperator’? > > I would like to read from multiple directories by a single operator by > selecting multiple files using ‘filePatternRegExp’.

Multiple directories

2016-07-06 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Hi, Can you please let me know, How would I add multiple directories to an Operator which extends 'AbstractFileInputOperator'? I would like to read from multiple directories by a single operator by selecting multiple files using 'filePatternRegExp'. R

RE: Reading Multiple directories in parallel

2016-06-28 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Hi, Thank you so much Ram, it worked !! Regards, Surya Vamshi From: Munagala Ramanath [mailto:r...@datatorrent.com] Sent: 2016, June, 28 4:12 PM To: users@apex.apache.org Subject: Re: Reading Multiple directories in parallel The return collection should match the function return type: List

Re: Reading Multiple directories in parallel

2016-06-28 Thread Munagala Ramanath
(SlicedDirectoryScanner) scanners.get(i); > >scn.setStartIndex(first); > > scn.setEndIndex(last); > >scn.setDirectory(dir); > > > >op

RE: Reading Multiple directories in parallel

2016-06-28 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
rtitions.size()); return newPartitions; } Regards, Surya Vamshi From: Munagala Ramanath [mailto:r...@datatorrent.com] Sent: 2016, June, 28 2:35 PM To: users@apex.apache.org Subject: Re: Reading Multiple directories in parallel You can add those properties in y

Re: Reading Multiple directories in parallel

2016-06-28 Thread Munagala Ramanath
potentStorageManager()); > > } > > } > > > > > > Regards, > > Surya Vamshi > > > > *From:* Munagala Ramanath [mailto:r...@datatorrent.com] > *Sent:* 2016, June, 28 2:03 PM > *To:* users@apex.apache.org > *Subject:* Re: Reading Mult

RE: Reading Multiple directories in parallel

2016-06-28 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
ds, Surya Vamshi From: Munagala Ramanath [mailto:r...@datatorrent.com] Sent: 2016, June, 28 2:03 PM To: users@apex.apache.org Subject: Re: Reading Multiple directories in parallel Not sure I fully understand the question but you can add whatever fields you need to your class that extends Abstract

Re: Reading Multiple directories in parallel

2016-06-28 Thread Munagala Ramanath
Not sure I fully understand the question but you can add whatever fields you need to your class that extends *AbstractFileInputOperator*. For example, https://github.com/DataTorrent/examples/blob/master/tutorials/fileIO-multiDir/src/main/java/com/example/fileIO/FileReaderMultiDir.java defines fiel

Reading Multiple directories in parallel

2016-06-28 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Hi Ram, Can you please suggest , how would I add another variable (like 'directory') while creating multiple partitions of AbstractFileInputOperator in the define partition method. I have currently added variables in the AbstractFileInputOperator , which I guess not a better way. These variab

Re: Multiple directories

2016-06-16 Thread Munagala Ramanath
f36cb193c1c by jenkins source checksum > 48db4b572827c2e9c2da66982d14 > > 7626", > > "resourceManagerVersion": "2.7.1.2.3.2.0-2950", > >"resourceManagerVersionBuiltOn": "2015-09-30T18:20Z", > > "rmStateStoreNam

RE: Multiple directories

2016-06-16 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
rver.resourcemanager.recov ery.ZKRMStateStore", "startedOn": 1465495186350, "state": "STARTED" } } Regards, Surya Vamshi From: Munagala Ramanath [mailto:r...@datatorrent.com] Sent: 2016, June, 16 2:57 PM To: users@apex.apache.org Subjec

Re: Multiple directories

2016-06-16 Thread Munagala Ramanath
t/hadoop-yarn-client/lib/* > > HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn > > YARN_NODEMANAGER_HEAPSIZE=1024 > > QTINC=/usr/lib64/qt-3.3/include > > USER=mukkamula > > HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m > -XX:MaxPermSize=512m >

RE: Multiple directories

2016-06-16 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
/lesspipe.sh %s LANG=en_US.UTF-8 YARN_NICENESS=0 YARN_IDENT_STRING=yarn HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce Regards, Surya Vamshi From: Mukkamula, Suryavamshivardhan (CWM-NR) Sent: 2016, June, 16 8:58 AM To: users@apex.apache.org Subject: RE: Multiple directories Thank y

RE: Multiple directories

2016-06-16 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Thank you for the inputs. Regards, Surya Vamshi From: Thomas Weise [mailto:thomas.we...@gmail.com] Sent: 2016, June, 15 5:08 PM To: users@apex.apache.org Subject: Re: Multiple directories On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) mailto:suryavamshivardhan.mukkam

Re: Multiple directories

2016-06-15 Thread Thomas Weise
On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) < suryavamshivardhan.mukkam...@rbc.com> wrote: > Hi Ram/Team, > > > > I could create an operator which reads multiple directories and parses the > each file with respect to an individual configur

RE: Multiple directories

2016-06-15 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Hi Ram/Team, I could create an operator which reads multiple directories and parses the each file with respect to an individual configuration file and generates output file to different directories. However I have some questions regarding the design. è We have 120 directories to scan on HDFS

RE: Multiple directories

2016-06-09 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Hi Ram, Assuming that properties are set as Key,Value pairs. I have used the properties as below and I can read the multiple directories in parallel. Thank you. dt.application.FileIO.operator.read.prop.inputDirectory(source_123) tmp/fileIO/source_123

RE: Multiple directories

2016-06-08 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
, 05 10:24 PM To: users@apex.apache.org Subject: Re: Multiple directories Some sample code to monitor multiple directories is now available at: https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir It shows how to use a custom implementation of definePartitions() to create

Re: Multiple directories

2016-06-05 Thread Munagala Ramanath
Some sample code to monitor multiple directories is now available at: https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir It shows how to use a custom implementation of definePartitions() to create multiple partitions of the file input operator and group them into

Re: Multiple directories

2016-05-25 Thread Munagala Ramanath
> Do you have sample usage for partitioning with individual configuration > set ups different partitions? > > > > Regards, > > Surya Vamshi > > > > *From:* Munagala Ramanath [mailto:r...@datatorrent.com] > *Sent:* 2016, May, 25 12:11 PM > *To:* users@apex

RE: Multiple directories

2016-05-25 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
To: users@apex.apache.org Subject: Re: Multiple directories You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader For (a), each partition (i.e. replica or the operator) can scan only a single directory, so if you have 100 directories, you can simply start with 100

Re: Multiple directories

2016-05-25 Thread Munagala Ramanath
p the required number of partitions. For (b), there is some documentation in the Operators section at http://docs.datatorrent.com/ including sample code. There operators support scanning multiple directories out of the box but have more elaborate configuration options. Check this out and see if it wor

RE: Multiple directories

2016-05-25 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
:49 PM To: Mukkamula, Suryavamshivardhan (CWM-NR) Subject: Multiple directories One way of addressing the issue is to use some sort of external tool (like a script) to copy all the input files to a common directory (making sure that the file names are unique to prevent one file from overwriting