Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Flavio Pompermaier
Talking with Stefano this morning and looking at the DataSourceTask code we discovered that the open() and close() methods are both called for every split and not once per inputFormat instance (maybe open and close should be renamed as openSplit and closeSplit to avoid confusion...). I think that i

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Stefano Bortoli
Being a generic JDBC input format, I would prefer to stay with Row, letting the developer manage the cast according to the driver functionalities. As for the open() and close() issue, I agree with Flavio that we'd need a better management of the inputformat lifecycle. Perhaps a new interface exten

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Stefano Bortoli
Of course there is one already. We'll look into the runtime context. saluti, Stefano 2016-04-18 9:41 GMT+02:00 Stefano Bortoli : > Being a generic JDBC input format, I would prefer to stay with Row, > letting the developer manage the cast according to the driver > functionalities. > > As for the

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Aljoscha Krettek
There is also InputFormat.configure() which is called before any split processing happens. But I see your point about a missing close() method that is called after all input splits have been processed. On Mon, 18 Apr 2016 at 09:44 Stefano Bortoli wrote: > Of course there is one already. We'll lo

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Flavio Pompermaier
Yes, I forgot to mention that I could instantiate the connection in the configure() but then I can't close it (as you confirmed) :( On Mon, Apr 18, 2016 at 9:46 AM, Aljoscha Krettek wrote: > There is also InputFormat.configure() which is called before any split > processing happens. But I see yo

Re: GSoC Project Proposal Draft: Code Generation in Serializers

2016-04-18 Thread Chiwan Park
Yes, I know Janino is a pure Java project. I meant if we add Scala code to flink-core, we should add Scala dependency to flink-core and it could be confusing. Regards, Chiwan Park > On Apr 18, 2016, at 2:49 PM, Márton Balassi wrote: > > Chiwan, just to clarify Janino is a Java project. [1] >

Re: Surprising order of events in union of two streams

2016-04-18 Thread Aljoscha Krettek
Hi, yes, I'm afraid you need a custom operator for that. (We are working on providing built-in support for this, though) I sketched an Operator that does the sorting and also wrote a quick example that uses it: SortedWindowOperator: https://gist.github.com/aljoscha/6600bc1121b7f8a0f68b89988dd341bd

[jira] [Created] (FLINK-3774) Flink configuration is not correctly forwarded to PlanExecutor in ScalaShellRemoteEnvironment

2016-04-18 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3774: Summary: Flink configuration is not correctly forwarded to PlanExecutor in ScalaShellRemoteEnvironment Key: FLINK-3774 URL: https://issues.apache.org/jira/browse/FLINK-3774

[jira] [Created] (FLINK-3775) Flink Scala shell does not respect Flink configuration

2016-04-18 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3775: Summary: Flink Scala shell does not respect Flink configuration Key: FLINK-3775 URL: https://issues.apache.org/jira/browse/FLINK-3775 Project: Flink Issue Ty

[jira] [Created] (FLINK-3776) Flink Scala shell does not allow to set configuration for local execution

2016-04-18 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3776: Summary: Flink Scala shell does not allow to set configuration for local execution Key: FLINK-3776 URL: https://issues.apache.org/jira/browse/FLINK-3776 Project: Flin

[RESULT] [VOTE] Release Apache Flink 1.0.2 (RC2)

2016-04-18 Thread Ufuk Celebi
This vote is cancelled in favour of RC3, because of the problem raised by Fabian. On Fri, Apr 15, 2016 at 1:02 PM, Fabian Hueske wrote: > There is a request from the Mahout community to include a fix for > FLINK-3762 in 1.0.2. > Stephan gave a +1 [1] and I would also like to include it. > > I'm r

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Fabian Hueske
I agree, a method to close an input format is missing. InputFormat is an API stable interface, so it is not possible to extend it (until Flink 2.0). RichInputFormat is API stable as well, but an abstract class. So it should be possible to add an empty default implementation of a closeInputFormat()

[jira] [Created] (FLINK-3777) Add open and close methods to manage IF lifecycle

2016-04-18 Thread Flavio Pompermaier (JIRA)
Flavio Pompermaier created FLINK-3777: - Summary: Add open and close methods to manage IF lifecycle Key: FLINK-3777 URL: https://issues.apache.org/jira/browse/FLINK-3777 Project: Flink Iss

Re: [RESULT] [VOTE] Release Apache Flink 1.0.2 (RC2)

2016-04-18 Thread Fabian Hueske
There is also FLINK-3657 which was recently merged to master. This feature was requested by the Mahout community and the commit changes the visibility of a method in DataSetUtils. So strictly speaking, this is not a bug fix but a new feature. On the other hand, it is very lightweight change and doe

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Flavio Pompermaier
Hi Fabian, I've just created a JIRA for that (FLINK-3777). As you said input split should be not too fine-grained but we have a table with 11 billions of rows that can't be queried with ranges greated than 100K of rows because it has a lot of JOIN and increasing thhis threashold implies incredibly

Re: [RESULT] [VOTE] Release Apache Flink 1.0.2 (RC2)

2016-04-18 Thread Ufuk Celebi
I am fine with it since it only touches a utility class. So +1 to include it as well. – Ufuk On Mon, Apr 18, 2016 at 12:07 PM, Fabian Hueske wrote: > There is also FLINK-3657 which was recently merged to master. > This feature was requested by the Mahout community and the commit changes > the vi

Re: [RESULT] [VOTE] Release Apache Flink 1.0.2 (RC2)

2016-04-18 Thread Robert Metzger
+1 from my side as well On Mon, Apr 18, 2016 at 12:10 PM, Ufuk Celebi wrote: > I am fine with it since it only touches a utility class. So +1 to > include it as well. > > – Ufuk > > On Mon, Apr 18, 2016 at 12:07 PM, Fabian Hueske wrote: > > There is also FLINK-3657 which was recently merged to

Re: [RESULT] [VOTE] Release Apache Flink 1.0.2 (RC2)

2016-04-18 Thread Aljoscha Krettek
+1, since others are blocked on this On Mon, 18 Apr 2016 at 12:10 Ufuk Celebi wrote: > I am fine with it since it only touches a utility class. So +1 to > include it as well. > > – Ufuk > > On Mon, Apr 18, 2016 at 12:07 PM, Fabian Hueske wrote: > > There is also FLINK-3657 which was recently me

Re: [RESULT] [VOTE] Release Apache Flink 1.0.2 (RC2)

2016-04-18 Thread Ufuk Celebi
OK, thanks for the responses. I'll go ahead and include it for RC3. On Mon, Apr 18, 2016 at 12:20 PM, Aljoscha Krettek wrote: > +1, since others are blocked on this > > On Mon, 18 Apr 2016 at 12:10 Ufuk Celebi wrote: > >> I am fine with it since it only touches a utility class. So +1 to >> inclu

Re: GSoC Project Proposal Draft: Code Generation in Serializers

2016-04-18 Thread Fabian Hueske
+1 for not mixing Java and Scala in flink-core. Maybe it makes sense to implement the code generated serializers / comparators as a separate module which can be plugged-in. This could be pure Scala. In general, I think it would be good to have some kind of "version management" for serializers in p

Savepoint for time windows

2016-04-18 Thread Ozan DENİZ
Hi everyone, I am trying to implement savepoint mechanism for my Flink project. Here is the scenario: I got the snapshot of Flink application by using "flink savepoint " command while the application is running. After saving snapshot of application, I canceled the job from web ui than I cha

[VOTE] Release Apache Flink 1.0.2 (RC3)

2016-04-18 Thread Ufuk Celebi
Dear Flink community, Please vote on releasing the following candidate as Apache Flink version 1.0.2. The commit to be voted on: d39af152a166ddafaa2466cdae82695880893f3e Branch: release-1.0.2-rc3 (see https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git;a=shortlog;h=refs/heads/release-1.

Re: Savepoint for time windows

2016-04-18 Thread Ufuk Celebi
Can you please share the program before and after the savepoint? – Ufuk On Mon, Apr 18, 2016 at 3:11 PM, Ozan DENİZ wrote: > Hi everyone, > > I am trying to implement savepoint mechanism for my Flink project. > > Here is the scenario: > > I got the snapshot of Flink application by using "flink s

Re: FLINK-3750 (JDBCInputFormat)

2016-04-18 Thread Stefano Bortoli
As Flavio underlined, it is not about selecting a certain number of rows, but executing queries with sequence of joins on a very large database. We played around to find the best throughput. Honestly, I prefer to have many smaller range-queries with more parallel threads than fewer expensive querie

[jira] [Created] (FLINK-3778) ScalaShellRemoteStreamEnvironment cannot be forwarded a user configuration

2016-04-18 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3778: Summary: ScalaShellRemoteStreamEnvironment cannot be forwarded a user configuration Key: FLINK-3778 URL: https://issues.apache.org/jira/browse/FLINK-3778 Project: Fli

Re: Savepoint for time windows

2016-04-18 Thread Stephan Ewen
Hi! Yes, window contents is part of savepoints. If you change the topology, it is crucial that the new topology matches the old window contents to the new operator. If you change the structure of the program, you probably need to assign persistent names to the operators. See https://ci.apache.org

[jira] [Created] (FLINK-3779) Add support for queryable state

2016-04-18 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-3779: -- Summary: Add support for queryable state Key: FLINK-3779 URL: https://issues.apache.org/jira/browse/FLINK-3779 Project: Flink Issue Type: Improvement C

[jira] [Created] (FLINK-3780) Jaccard Similarity

2016-04-18 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3780: - Summary: Jaccard Similarity Key: FLINK-3780 URL: https://issues.apache.org/jira/browse/FLINK-3780 Project: Flink Issue Type: New Feature Components: Gell

Problem with flink while development

2016-04-18 Thread Jitendra Agrawal
Hi Team, Problem Description : When I was calling *reduce()* method on keyedStream object then getting Ecxeption as "* org.apache.flink.api.common.InvalidProgramException: Specifying keys via field positions is only valid for tuple data types. Type: Integer*". StreamExecutionEnvironment env

Re: Issue regarding the submission of a topology to a remote flink cluster.

2016-04-18 Thread Till Rohrmann
I think then you have to either reconfigure your cluster environment or wait until we bump the Akka version to 2.4.x which supports having an internal and external IP address. Cheers, Till On Fri, Apr 15, 2016 at 6:36 PM, star jlong wrote: > Hi Till/Ned, > > Soory I thought this was my post. >

[jira] [Created] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()

2016-04-18 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3781: - Summary: BlobClient may be left unclosed in BlobCache#deleteGlobal() Key: FLINK-3781 URL: https://issues.apache.org/jira/browse/FLINK-3781 Project: Flink Issue Type: Bug

Re: Problem with flink while development

2016-04-18 Thread Saikat Maitra
Hello Jitendra, I am new to Flink community but may have seen this issue earlier. Can you try to use DataStream> instead of KeyedStream integers4 Regards Saikat On Mon, Apr 18, 2016 at 7:37 PM, Jitendra Agrawal < jitendra.agra...@northgateps.com> wrote: > Hi Team, > > Problem Description : Wh

Re: GSoC Project Proposal Draft: Code Generation in Serializers

2016-04-18 Thread Gábor Horváth
Unfortunately making code generation a separate module would introduce cyclic dependency. Code generation requires the TypeInfo which is available in flink-core and flink-core requires the generated serializers from the code generation module. Do you have a solution for this? I think if we can com

RE: Surprising order of events in union of two streams

2016-04-18 Thread Eron Wright
Not entirely related, but for the special case of writing a parallelized source that emits records in event time order, I found the MergeIterator to be most useful. Here's an example:https://github.com/nupic-community/flink-htm/blob/eb29f97f08f3482b32228db7284f669aad8dce2e/flink-htm-streaming-s

Re: Problem with flink while development

2016-04-18 Thread Chenguang He
It throws by  if (!type.isTupleType()) { throw new InvalidProgramException("Specifying keys via field positions is only valid " + "for tuple data types. Type: " + type); } So, like Saikat say

Re: Problem with flink while development

2016-04-18 Thread Matthias J. Sax
If you work on plain Integer (or other non-POJO types) you need to provide a KeySelector to make it work. For you case something like this: .keyBy(new KeySelector() { @Override public Integer getKey(Integer value) throws Exception { return value; } }) As S

Bug in Scala Shell

2016-04-18 Thread Trevor Grant
I was trying out the new scala-shell with streaming support... The following code executes correctly the first time I run it: val survival = benv.readCsvFile[(String, String, String, String)]("file:///home/trevor/gits/datasets/haberman/haberman.data") survival.count() However, if I call survival

[jira] [Created] (FLINK-3782) ByteArrayOutputStream and ObjectOutputStream should close

2016-04-18 Thread Chenguang He (JIRA)
Chenguang He created FLINK-3782: --- Summary: ByteArrayOutputStream and ObjectOutputStream should close Key: FLINK-3782 URL: https://issues.apache.org/jira/browse/FLINK-3782 Project: Flink Issue T

RE: Savepoint for time windows

2016-04-18 Thread Ozan DENİZ
Hi Stephan and Ufuk, Thank you for your reply. I have assigned uid to the "assignTimestampsAndWatermarks", "addSource", "apply" operators. However, I couldn't assign uid to the time window. Therefore the time window doesn't hold any state regarding timestamp. For example, I implemented a cust

[jira] [Created] (FLINK-3783) Support weighted random sampling with reservoir

2016-04-18 Thread GaoLun (JIRA)
GaoLun created FLINK-3783: - Summary: Support weighted random sampling with reservoir Key: FLINK-3783 URL: https://issues.apache.org/jira/browse/FLINK-3783 Project: Flink Issue Type: Improvement