Re: Enhance batch support - batch demarcation

2016-11-02 Thread Bhupesh Chawda
Hi All, Starting with the implementation, we are planning to take care of a single batch job first. We will take up the scheduling aspect later. The first requirement is the following: A batch job is an Apex application which picks up data from the source, and processes it. Once the data is com

Re: example applications in malhar

2016-11-02 Thread Lakshmi Velineni
Thanks for the suggestions and I am working on the process to migrate the examples with the guidelines you mentioned. I will send out a list of examples and the destination modules very soon. On Thu, Oct 27, 2016 at 1:43 PM, Thomas Weise wrote: > Maybe a good first step would be to identify whi

[jira] [Commented] (APEXCORE-526) Publish javadoc for releases on ASF infrastructure

2016-11-02 Thread Munagala V. Ramanath (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15631520#comment-15631520 ] Munagala V. Ramanath commented on APEXCORE-526: --- https://ci.apache.org/pro

[jira] [Comment Edited] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread David Yan (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630848#comment-15630848 ] David Yan edited comment on APEXCORE-570 at 11/2/16 11:11 PM:

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread David Yan (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630848#comment-15630848 ] David Yan commented on APEXCORE-570: [~PramodSSImmaneni] Maybe we can't block it but

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread Pramod Immaneni (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630806#comment-15630806 ] Pramod Immaneni commented on APEXCORE-570: -- [~davidyan] that would lead to dead

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread Munagala V. Ramanath (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630749#comment-15630749 ] Munagala V. Ramanath commented on APEXCORE-570: --- I think part of the probl

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread David Yan (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630694#comment-15630694 ] David Yan commented on APEXCORE-570: If the spooling capacity limit is reached, woul

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pramod Immaneni
Suppose I am processing data in a file and I want to do something at the end of a file at the output operator, I would send an end file control tuple and act on it when I receive it at the output. In a single window I may end up processing multiple files and if I don't have multiple ports and logic

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pramod Immaneni
With option 2, users can still do idempotent processing by delaying their processing of the control tuples to end window. They have the flexibility with this option. In the usual scenarios, you will have one port and given that control tuples will be sent to all partitions, all the data sent before

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Thomas Weise
The use cases listed in the original discussion don't call for option 2. It seems to come with additional complexity and implementation cost. Can those in favor of option 2 please also provide the use case for it. Thanks, Thomas On Wed, Nov 2, 2016 at 10:36 PM, Siyuan Hua wrote: > I will vote

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Siyuan Hua
I will vote for approach 1. First of all that one sounds easier to do to me. And I think idempotency is important. It may run at the cost of higher latency but I think it is ok And in addition, when in the future if users do need realtime control tuple processing, we can always add the option on

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread David Yan
Pramod, To answer your questions, the control tuples will be delivered to all downstream partitions, and an additional emitControl method (actual name TBD) can be added to DefaultOutputPort without breaking backward compatibility. Also, to clarify, each operator should have the ability to block f

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Amol Kekre
A feature that incurs risk with processing order, and more so with idempotency is a big enough reason to worry about with option 2. Is there is a critical use case that needs this feature? Thks Amol On Wed, Nov 2, 2016 at 1:25 PM, Pramod Immaneni wrote: > I like approach 2 as it gives more fle

[jira] [Commented] (APEXMALHAR-2321) Improve Buckets memory management

2016-11-02 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630594#comment-15630594 ] ASF GitHub Bot commented on APEXMALHAR-2321: GitHub user brightchen opene

[GitHub] apex-malhar pull request #481: APEXMALHAR-2321 #resolve #comment Improve Buc...

2016-11-02 Thread brightchen
GitHub user brightchen opened a pull request: https://github.com/apache/apex-malhar/pull/481 APEXMALHAR-2321 #resolve #comment Improve Buckets memory management You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malh

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pradeep A. Dalvi
As a rule of thumb in any real time operating system, control tuples should always be handled using Priority Queues. We may try to control priorities by defining levels. And shall not be delivered at window boundaries. In short, control tuples shall never be treated as any other tuples in real ti

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pramod Immaneni
I like approach 2 as it gives more flexibility and also allows for low-latency options. I think the following are important as well. 1. Delivering control tuples to all downstream partitions. 2. What mechanism will the operator developer use to send the control tuple? Will it be an additional meho

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread David Yan
Hi all, I would like to renew the discussion of control tuples. Last time, we were in a debate about whether: 1) the platform should enforce that control tuples are delivered at window boundaries only or: 2) the platform should deliver control tuples just as other tuples and it's the operator

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread Pramod Immaneni (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630269#comment-15630269 ] Pramod Immaneni commented on APEXCORE-570: -- For the inter-process case, if the

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread Thomas Weise (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630072#comment-15630072 ] Thomas Weise commented on APEXCORE-570: --- >From the JIRA description, this sounds l

[jira] [Commented] (APEXMALHAR-2274) AbstractFileInputOperator gets killed when there are a large number of files.

2016-11-02 Thread Matt Zhang (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629893#comment-15629893 ] Matt Zhang commented on APEXMALHAR-2274: The scanner in FileSplitterInput is

[jira] [Created] (APEXMALHAR-2326) Failures in SQS unit tests

2016-11-02 Thread Sanjay M Pujare (JIRA)
Sanjay M Pujare created APEXMALHAR-2326: --- Summary: Failures in SQS unit tests Key: APEXMALHAR-2326 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2326 Project: Apache Apex Malhar

[GitHub] apex-malhar pull request #480: [APEXMALHAR-2220] Move the FunctionOperator t...

2016-11-02 Thread d9liang
GitHub user d9liang opened a pull request: https://github.com/apache/apex-malhar/pull/480 [APEXMALHAR-2220] Move the FunctionOperator to Malhar library Merge function operators under org.apache.apex.malhar.stream.api.operator and function interface under org.apache.apex.malhar.strea

[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-11-02 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629562#comment-15629562 ] ASF GitHub Bot commented on APEXMALHAR-2220: GitHub user d9liang opened a

[jira] [Resolved] (APEXMALHAR-2302) Exposing few properties of FSSplitter and BlockReader operators to FSRecordReaderModule to tune Application

2016-11-02 Thread Yogi Devendra (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yogi Devendra resolved APEXMALHAR-2302. --- Resolution: Fixed Fix Version/s: 3.6.0 > Exposing few properties of FSSpli

[GitHub] apex-malhar pull request #457: APEXMALHAR-2302 Exposing few properties of FS...

2016-11-02 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/apex-malhar/pull/457 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (APEXMALHAR-2302) Exposing few properties of FSSplitter and BlockReader operators to FSRecordReaderModule to tune Application

2016-11-02 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629394#comment-15629394 ] ASF GitHub Bot commented on APEXMALHAR-2302: Github user asfgit closed th

[jira] [Commented] (APEXMALHAR-2325) Same block id is emitting from FSInputModule

2016-11-02 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/APEXMALHAR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628804#comment-15628804 ] ASF GitHub Bot commented on APEXMALHAR-2325: GitHub user chaithu14 opened

[GitHub] apex-malhar pull request #479: APEXMALHAR-2325 1) Set the file system defaul...

2016-11-02 Thread chaithu14
GitHub user chaithu14 opened a pull request: https://github.com/apache/apex-malhar/pull/479 APEXMALHAR-2325 1) Set the file system default block size to the reader. 2) Set the block size to the reader context You can merge this pull request into a Git repository by running: $

[jira] [Created] (APEXMALHAR-2325) Same block id is emitting from FSInputModule

2016-11-02 Thread Chaitanya (JIRA)
Chaitanya created APEXMALHAR-2325: - Summary: Same block id is emitting from FSInputModule Key: APEXMALHAR-2325 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2325 Project: Apache Apex Malhar

[jira] [Commented] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread Pramod Immaneni (JIRA)
[ https://issues.apache.org/jira/browse/APEXCORE-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628460#comment-15628460 ] Pramod Immaneni commented on APEXCORE-570: -- Here is an example on how to do thi

[jira] [Created] (APEXCORE-570) Prevent upstream operators from getting too far ahead when downstream operators are slow

2016-11-02 Thread Pramod Immaneni (JIRA)
Pramod Immaneni created APEXCORE-570: Summary: Prevent upstream operators from getting too far ahead when downstream operators are slow Key: APEXCORE-570 URL: https://issues.apache.org/jira/browse/APEXCORE-570

Re: [jira] [Commented] (APEXMALHAR-2303) S3 Line By Line Module

2016-11-02 Thread AJAY GUPTA
Hi Apex Dev Community, For Fixed Width S3 record Reader, the input is the block metadata containing the block offset and the block length. The length of the block may not be a factor of the length of the record. (For eg, block length can be 1MB, record length can be 23 bytes) Hence, the first byte