Parallel read will work only if the scheme is "s3a" and the Hadoop
version is 2.7+.
--- End diff --
add link to amazon s3 wiki page to explain details about hadoop version and
scheme.
> Creation of S3 Input Module
> ---
>
>
ublic class S3FileSplitter extends FSFileSplitter
+{
+ public S3FileSplitter()
+ {
--- End diff --
Not required.
> Creation of S3 Input Module
> ---
>
> Key: APEXMALHAR-2019
> URL: https://issues.apache.org/jira/
IOException
--- End diff --
mention the difference from super in javadocs.
> Creation of S3 Input Module
> ---
>
> Key: APEXMALHAR-2019
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2019
>
bucketUri = fs.getScheme() + "://" + extractBucket(uri);
+ }
+
+ @VisibleForTesting
+ protected String extractBucket(String s3uri)
--- End diff --
javadoc.
> Creation of S3 Input Module
> ---
>
> Key
hu14 opened a pull request:
https://github.com/apache/incubator-apex-malhar/pull/263
APEXMALHAR-2019 S3-Input Implemented S3 Input Module
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chaithu14/incubator-apex-malhar
APEXMALHAR-201
; > > > > > >3. Mix of 1 and 2. Multiple files are read in
> > parallel,
> > > > and
> > > > > > > every
> > > > > > > > > file
> > > > > > > > > > >in itself is also read in parallel.
> > > > > > > > > > &g
ersions
> > > > > > > > > > of Hadoop : 2.2.0 or so and a lot better support in 2.7.
> > So,
> > > > will
> > > > > > > your
> > > > > > > > > > module work on all Hadoop versions post 2.2 or only
; > One way to support this feature is to copy few S3 related
> > > files
> > > > > > from
> > > > > > > > Hadoop 2.7 version into the module and will use this in
> module.
> > > > > > > >
> > > > >
dent
> > > > > > > of Hadoop version.
> > > > > > >
> > > > > > > @All:
> > > > > > > Please share your thoughts on this approach.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Chaitanya
> >
gt; >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Many people face issues while copy data from S3 at large scale.
> > This
> > > > > module
> > > > > > i
It's a good idea to extract out common code in parent class.
+1 for this feature.
-Priyanka
On Thu, Mar 17, 2016 at 1:57 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:
> Dear Community,
>
> I am proposing S3 Input Module. Primary functionality of this module i
t; > >
> > > >
> > > > Regards,
> > > > Sandeep
> > > >
> > > > On Thu, Mar 17, 2016 at 2:04 PM, Priyanka Gugale <
> > > priya...@datatorrent.com
> > > > >
> > > > wrote:
> > > >
&g
; > >
> > > wrote:
> > >
> > > > It's a good idea to extract out common code in parent class.
> > > >
> > > > +1 for this feature.
> > > >
> > > > -Priyanka
> > > >
> > > > On Thu, Mar 17,
Chaitanya created APEXMALHAR-2019:
-
Summary: Creation of S3 Input Module
Key: APEXMALHAR-2019
URL: https://issues.apache.org/jira/browse/APEXMALHAR-2019
Project: Apache Apex Malhar
Issue
Dear Community,
I am proposing S3 Input Module. Primary functionality of this module is
to parallel read files from S3 bucket.
Below is the JIRA created for this task:
https://issues.apache.org/jira/browse/APEXMALHAR-2019
Design of this module is similar to HDFS input module. So, I will
6 at 1:57 PM, Chaitanya Chebolu <
> > chaita...@datatorrent.com> wrote:
> >
> > > Dear Community,
> > >
> > > I am proposing S3 Input Module. Primary functionality of this module
> is
> > > to parallel read files from S3 bucket.
> >
> a...@datatorrent.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > +1. Very common use case. Nice to have it.
> > > > > > >
> > > > > > > Thks
> > > > > > > Amol
>
readily used with simple
> > > > configuration.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Sandeep
> > > > >
> > > > > On Thu, Mar 17, 2016 at 2:04 PM, Priyanka Gugale <
> > > > priya
s
> > > > > > Amol
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 17, 2016 at 1:49 AM, Sandeep Deshmukh <
> > > > > sand...@datatorrent.com
> > > > > > >
> > > > > > wro
>
> > wrote:
> >
> > > It's a good idea to extract out common code in parent class.
> > >
> > > +1 for this feature.
> > >
> > > -Priyanka
> > >
> > > On Thu, Mar 17, 2016 at 1:57 PM, Chaitanya Chebolu <
> >
nt class.
>
> +1 for this feature.
>
> -Priyanka
>
> On Thu, Mar 17, 2016 at 1:57 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
> > Dear Community,
> >
> > I am proposing S3 Input Module. Primary functionality of this module is
> >
> >
> > > > > > Regards,
> > > > > > > Sandeep
> > > > > > >
> > > > > > > On Fri, Mar 18, 2016 at 10:49 AM, Pradeep Dalvi <
> > > > > > > pradeep.da...@datatorrent.com> wrote:
> > > > > > >
>
10:49 AM, Pradeep Dalvi <
> > > > > > pradeep.da...@datatorrent.com> wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Thu, Mar 17, 2016 at 10:56 PM, Amol Kekre <
> > a...@datatorrent.com
>
23 matches
Mail list logo