Re: JdbcPOJOInputOperator Behaviour

2016-05-09 Thread Devendra Tagare
Hi, There is some work going on the JDBC polling operator as per, https://issues.apache.org/jira/browse/APEXMALHAR-2066 The feature set of this operator seems to be similar.That being said, I see the rationale in updating the existing one since it works with POJO's already. Why remove fetch dire

Re: NFS Input Module

2016-05-07 Thread Devendra Tagare
@Thomas,@Amol I would like to contribute/collaborate on this. Will create a ticket for the same. Thanks, Dev On Sat, May 7, 2016 at 11:04 AM, Thomas Weise wrote: > The documentation is here and is indexed: > > http://apex.apache.org/docs/malhar/ > > I think this is a matter of enhancing it. >

Re: JIRA traffic on the dev list

2016-04-21 Thread Devendra Tagare
+1 Thanks, Dev On Thu, Apr 21, 2016 at 2:01 PM, Amol Kekre wrote: > +1 > > Thks > Amol > > On Thu, Apr 21, 2016 at 1:16 PM, Siyuan Hua > wrote: > > > +1 > > > > On Thu, Apr 21, 2016 at 12:56 PM, David Yan > wrote: > > > > > I strongly agree with this statement in the email, "high-volume lists

[jira] [Commented] (APEXMALHAR-2033) Streaming JSON parser

2016-03-29 Thread devendra tagare (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216823#comment-15216823 ] devendra tagare commented on APEXMALHAR-2033: - Has an apache license

Re: [GitHub] incubator-apex-malhar pull request: APEXMALHAR-2011-2012 Avro to P...

2016-03-29 Thread Devendra Tagare
Failure in reading a record should not cause a failure to read the entire file. On Tue, Mar 29, 2016 at 10:18 AM, chinmaykolhatkar wrote: > Github user chinmaykolhatkar commented on a diff in the pull request: > > > https://github.com/apache/incubator-apex-malhar/pull/211#discussion_r57762245 >

[jira] [Created] (APEXMALHAR-2034) Avro File Input Operator

2016-03-28 Thread devendra tagare (JIRA)
devendra tagare created APEXMALHAR-2034: --- Summary: Avro File Input Operator Key: APEXMALHAR-2034 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2034 Project: Apache Apex Malhar

[jira] [Created] (APEXMALHAR-2033) Streaming JSON parser

2016-03-28 Thread devendra tagare (JIRA)
devendra tagare created APEXMALHAR-2033: --- Summary: Streaming JSON parser Key: APEXMALHAR-2033 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2033 Project: Apache Apex Malhar

[jira] [Created] (APEXMALHAR-2032) MapReduce Input format support for File Splitter

2016-03-28 Thread devendra tagare (JIRA)
devendra tagare created APEXMALHAR-2032: --- Summary: MapReduce Input format support for File Splitter Key: APEXMALHAR-2032 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2032 Project

Re: Naming sugestion for HDFS output modules

2016-03-28 Thread Devendra Tagare
Is the plan to align the tuple writer with the org.apache.hadoop.mapred output formats ? https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/mapred/OutputFormat.html The advantage of this would be that Apex can be used for ETL's to write mapreduce compatible output files which can be used

Re: Aligning FileSplitter and BlocReader with hadoop.mapreduce InputFormats

2016-03-24 Thread Devendra Tagare
; > ~ Yogi > > > > On 24 March 2016 at 10:47, Priyanka Gugale > > wrote: > > > > > So as I understand splitter would be format aware, in that case would > we > > > need different kinds of parser we have right now? Or the format aware > > >

Re: Apex DataFrame

2016-03-23 Thread Devendra Tagare
Hi, You can create a case class.Then map the incoming RDD's to the case class & convert the map to a data frame.By doing this you would have a Dataframe with the respective fields and associated datatypes set as per the ETL rules defined before setting the members of the case class. Sample below

Aligning FileSplitter and BlocReader with hadoop.mapreduce InputFormats

2016-03-23 Thread Devendra Tagare
Hi All, Initiating this thread to get the community's opinion on aligning the FileSplitter with InputSplit & the BlockReader with the RecordReader from org.apache.hadoop.mapreduce.InputSplit & org.apache.hadoop.mapreduce.RecordReader respectively. Some more details and rationale on the approach,

Adding AvroFileInputOperator to Malhar

2016-03-23 Thread Devendra Tagare
Hi All, I am working on adding a concrete implementation for reading Avro container files by extending the AbstractFileInputOperator & emitting Generic Records based on the file schema. This operator would be an input adapter and can work together with the Avro to POJO operator to read an Avro co

Streaming JSON parser

2016-03-22 Thread Devendra Tagare
Hi All, Starting this thread to get opinions for adding a streaming JSON parser for converting a JSON to POJO.This parser would be in addition to the databind parser (com.fasterxml.jackson.databind) we already have. The advantage of a streaming JSON parser is, 1.The parser need not parse entire

Re: [VOTE] Graduate Apex from the Incubator

2016-03-22 Thread Devendra Tagare
+ 1 ~Dev On Tue, Mar 22, 2016 at 3:44 PM, Ashish Tadose wrote: > + 1 > > Thanks, > Ashish > > > On 23-Mar-2016, at 3:19 AM, Bright Chen wrote: > > > > +1 > > > > Thanks > > Bright > > > >> On Mar 22, 2016, at 12:13 PM, Sairam Kannan > wrote: > >> > >> +1 > >> On Mar 22, 2016 2:09 PM, "Ashwin

Re: FileLineInputOperator in AbstractFileInputOperator

2016-03-21 Thread Devendra Tagare
+1.Using it was a bit of a search earlier.Moving it would defiantly ease up usage. Thanks, Dev On Mar 21, 2016 10:46 PM, "Tushar Gosavi" wrote: > +1 on moving it out as a separate class. > > On Tue, Mar 22, 2016 at 8:21 AM, Sandesh Hegde > wrote: > > > +1, I was wondering why it was hidden. Us

Re: Adding ParquetReaderOperator in Malhar

2016-03-14 Thread Devendra Tagare
gt; > > Parquet file from that file system using this plugin. > > > > > > ~ Yogi > > > > > > On 14 March 2016 at 11:31, Tushar Gosavi > wrote: > > > > > >> +1 > > >> > > >> Does Parquet support partitioned

Re: Adding ParquetReaderOperator in Malhar

2016-03-13 Thread Devendra Tagare
+ 1 ~Dev On Mon, Mar 14, 2016 at 11:12 AM, Shubham Pathak wrote: > Hello Community, > > I am working on developing a ParquetReaderOperator which will allow apex > users to read parquet files. > > Apache Parquet is a columnar storage format available to any project in the > Hadoop ecosystem, reg

Avro to POJO & POJO to Avro transformation operators

2016-03-09 Thread Devendra Tagare
Hi, I am working on developing a record converter which would take a POJO as an input and emit a Generic record as the output based on the given Avro schema. JIRA : https://issues.apache.org/jira/browse/APEXMALHAR-2011 Additionally, I would develop a record converter which would take an Avro Ge

Re: Adding Transform Operator to Malhar

2016-03-08 Thread Devendra Tagare
+1 Dev On Tue, Mar 8, 2016 at 5:38 PM, Yogi Devendra wrote: > Forgot to add in earlier email: > > +1 for this operator. > > ~ Yogi > > On 8 March 2016 at 17:03, Yogi Devendra wrote: > > > Can we think of better name than Transform operator? > > May be: > > Expression operator > > > > Reason:

[jira] [Created] (APEXMALHAR-2012) Avro Record to POJO converter

2016-03-08 Thread devendra tagare (JIRA)
devendra tagare created APEXMALHAR-2012: --- Summary: Avro Record to POJO converter Key: APEXMALHAR-2012 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2012 Project: Apache Apex Malhar

[jira] [Updated] (APEXMALHAR-2011) POJO to Avro record converter

2016-03-08 Thread devendra tagare (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] devendra tagare updated APEXMALHAR-2011: Issue Type: New Feature (was: Bug) > POJO to Avro record conver

[jira] [Created] (APEXMALHAR-2011) POJO to Avro record converter

2016-03-08 Thread devendra tagare (JIRA)
devendra tagare created APEXMALHAR-2011: --- Summary: POJO to Avro record converter Key: APEXMALHAR-2011 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2011 Project: Apache Apex Malhar

Re: Proposal for concrete operator for writing to HDFS file

2016-03-07 Thread Devendra Tagare
Hi, A clarification on 3.1, Does size here imply both the size in MB and in # events or either? If yes, then the file should be rolled on whichever is earlier between size in MB/# events/ time. ~Dev On Mon, Mar 7, 2016 at 2:54 PM, Yogi Devendra wrote: > Here is the summary of discussion till