shivardhan (CWM-NR)" <
suryavamshivardhan.mukkam...@rbc.com> wrote:
> Hi,
>
> Can you please let me know, How would I add multiple directories to an
> Operator which extends ‘AbstractFileInputOperator’?
>
> I would like to read from multiple directories by a single opera
, Priyanka Gugale
wrote:
> Hi,
>
> Take a look at TimeBasedDirectoryScanner in FileSplitterInput, this
> scanner accepts list of files/directories to scan. Also it accepts regex to
> filter on file names. I think you can pick ides on how to scan multiple
> directories from there.
&g
Hi,
Take a look at TimeBasedDirectoryScanner in FileSplitterInput, this scanner
accepts list of files/directories to scan. Also it accepts regex to filter
on file names. I think you can pick ides on how to scan multiple
directories from there.
-Priyanka
On Thu, Jul 7, 2016 at 6:59 PM, Mukkamula
Hi Yunhan,
This example I am already using for reading the data from multiple directories
in parallel. Hear each directory is given to an operator in parallel.
My requirement is I would like add multiple directories to a single operator.
Regards,
Surya Vamshi
From: Yunhan Wang [mailto:yun
> Hi,
>
> Can you please let me know, How would I add multiple directories to an
> Operator which extends ‘AbstractFileInputOperator’?
>
> I would like to read from multiple directories by a single operator by
> selecting multiple files using ‘filePatternRegExp’.
Hi,
Can you please let me know, How would I add multiple directories to an Operator
which extends 'AbstractFileInputOperator'?
I would like to read from multiple directories by a single operator by
selecting multiple files using 'filePatternRegExp'.
R
Hi,
Thank you so much Ram, it worked !!
Regards,
Surya Vamshi
From: Munagala Ramanath [mailto:r...@datatorrent.com]
Sent: 2016, June, 28 4:12 PM
To: users@apex.apache.org
Subject: Re: Reading Multiple directories in parallel
The return collection should match the function return type:
List
(SlicedDirectoryScanner) scanners.get(i);
>
>scn.setStartIndex(first);
>
> scn.setEndIndex(last);
>
>scn.setDirectory(dir);
>
>
>
>op
rtitions.size());
return newPartitions;
}
Regards,
Surya Vamshi
From: Munagala Ramanath [mailto:r...@datatorrent.com]
Sent: 2016, June, 28 2:35 PM
To: users@apex.apache.org
Subject: Re: Reading Multiple directories in parallel
You can add those properties in y
potentStorageManager());
>
> }
>
> }
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:r...@datatorrent.com]
> *Sent:* 2016, June, 28 2:03 PM
> *To:* users@apex.apache.org
> *Subject:* Re: Reading Mult
ds,
Surya Vamshi
From: Munagala Ramanath [mailto:r...@datatorrent.com]
Sent: 2016, June, 28 2:03 PM
To: users@apex.apache.org
Subject: Re: Reading Multiple directories in parallel
Not sure I fully understand the question but you can add whatever fields you
need
to your class that extends Abstract
Not sure I fully understand the question but you can add whatever fields
you need
to your class that extends *AbstractFileInputOperator*. For example,
https://github.com/DataTorrent/examples/blob/master/tutorials/fileIO-multiDir/src/main/java/com/example/fileIO/FileReaderMultiDir.java
defines fiel
Hi Ram,
Can you please suggest , how would I add another variable (like 'directory')
while creating multiple partitions of AbstractFileInputOperator in the define
partition method.
I have currently added variables in the AbstractFileInputOperator , which I
guess not a better way.
These variab
f36cb193c1c by jenkins source checksum
> 48db4b572827c2e9c2da66982d14
>
> 7626",
>
> "resourceManagerVersion": "2.7.1.2.3.2.0-2950",
>
>"resourceManagerVersionBuiltOn": "2015-09-30T18:20Z",
>
> "rmStateStoreNam
rver.resourcemanager.recov
ery.ZKRMStateStore",
"startedOn": 1465495186350,
"state": "STARTED"
}
}
Regards,
Surya Vamshi
From: Munagala Ramanath [mailto:r...@datatorrent.com]
Sent: 2016, June, 16 2:57 PM
To: users@apex.apache.org
Subjec
t/hadoop-yarn-client/lib/*
>
> HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce/yarn
>
> YARN_NODEMANAGER_HEAPSIZE=1024
>
> QTINC=/usr/lib64/qt-3.3/include
>
> USER=mukkamula
>
> HADOOP_CLIENT_OPTS=-Xmx2048m -XX:MaxPermSize=512m -Xmx2048m
> -XX:MaxPermSize=512m
>
/lesspipe.sh %s
LANG=en_US.UTF-8
YARN_NICENESS=0
YARN_IDENT_STRING=yarn
HADOOP_MAPRED_HOME=/usr/hdp/2.3.2.0-2950/hadoop-mapreduce
Regards,
Surya Vamshi
From: Mukkamula, Suryavamshivardhan (CWM-NR)
Sent: 2016, June, 16 8:58 AM
To: users@apex.apache.org
Subject: RE: Multiple directories
Thank y
Thank you for the inputs.
Regards,
Surya Vamshi
From: Thomas Weise [mailto:thomas.we...@gmail.com]
Sent: 2016, June, 15 5:08 PM
To: users@apex.apache.org
Subject: Re: Multiple directories
On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR)
mailto:suryavamshivardhan.mukkam
On Wed, Jun 15, 2016 at 1:55 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
suryavamshivardhan.mukkam...@rbc.com> wrote:
> Hi Ram/Team,
>
>
>
> I could create an operator which reads multiple directories and parses the
> each file with respect to an individual configur
Hi Ram/Team,
I could create an operator which reads multiple directories and parses the each
file with respect to an individual configuration file and generates output file
to different directories.
However I have some questions regarding the design.
è We have 120 directories to scan on HDFS
Hi Ram,
Assuming that properties are set as Key,Value pairs. I have used the properties
as below and I can read the multiple directories in parallel. Thank you.
dt.application.FileIO.operator.read.prop.inputDirectory(source_123)
tmp/fileIO/source_123
, 05 10:24 PM
To: users@apex.apache.org
Subject: Re: Multiple directories
Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
It shows how to use a custom implementation of definePartitions() to create
Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir
It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into
> Do you have sample usage for partitioning with individual configuration
> set ups different partitions?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:r...@datatorrent.com]
> *Sent:* 2016, May, 25 12:11 PM
> *To:* users@apex
To: users@apex.apache.org
Subject: Re: Multiple directories
You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader
For (a), each partition (i.e. replica or the operator) can scan only a single
directory, so if you have 100
directories, you can simply start with 100
p the required
number of partitions.
For (b), there is some documentation in the Operators section at
http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of
the box but have more
elaborate configuration options. Check this out and see if it wor
:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories
One way of addressing the issue is to use some sort of external tool (like a
script) to
copy all the input files to a common directory (making sure that the file names
are
unique to prevent one file from overwriting
27 matches
Mail list logo