[jira] [Created] (FLINK-5903) taskmanager.numberOfTaskSlots and yarn.containers.vcores did not work well in YARN mode

2017-02-23 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5903: --- Summary: taskmanager.numberOfTaskSlots and yarn.containers.vcores did not work well in YARN mode Key: FLINK-5903 URL: https://issues.apache.org/jira/browse/FLINK-5903 Project:

[jira] [Created] (FLINK-5902) Some images can not show in IE

2017-02-23 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5902: --- Summary: Some images can not show in IE Key: FLINK-5902 URL: https://issues.apache.org/jira/browse/FLINK-5902 Project: Flink Issue Type: Bug Components:

[jira] [Created] (FLINK-5901) DAG can not show properly in IE

2017-02-23 Thread Tao Wang (JIRA)
Tao Wang created FLINK-5901: --- Summary: DAG can not show properly in IE Key: FLINK-5901 URL: https://issues.apache.org/jira/browse/FLINK-5901 Project: Flink Issue Type: Bug Components:

[jira] [Created] (FLINK-5900) Add non-partial merge Aggregates and unit tests

2017-02-23 Thread Shaoxuan Wang (JIRA)
Shaoxuan Wang created FLINK-5900: Summary: Add non-partial merge Aggregates and unit tests Key: FLINK-5900 URL: https://issues.apache.org/jira/browse/FLINK-5900 Project: Flink Issue Type:

[jira] [Created] (FLINK-5899) Fix the bug in initializing the DataSetTumbleTimeWindowAggReduceGroupFunction

2017-02-23 Thread Shaoxuan Wang (JIRA)
Shaoxuan Wang created FLINK-5899: Summary: Fix the bug in initializing the DataSetTumbleTimeWindowAggReduceGroupFunction Key: FLINK-5899 URL: https://issues.apache.org/jira/browse/FLINK-5899 Project:

Re: Visualizing topologies

2017-02-23 Thread Ken Krugler
Hi Ufuk, > On Feb 22, 2017, at 2:18am, Ufuk Celebi wrote: > > Hey Ken! > > This looks really good. +1 to make this available publicly. > > We can link it from the Flink website and the viz tool Pat linked to. > The vizualizer has currently some open issues, it is not up to

[jira] [Created] (FLINK-5897) Untie Checkpoint Externalization from FileSystems

2017-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5897: --- Summary: Untie Checkpoint Externalization from FileSystems Key: FLINK-5897 URL: https://issues.apache.org/jira/browse/FLINK-5897 Project: Flink Issue Type:

Re: [DISCUSS] Code style / checkstyle

2017-02-23 Thread Jinkui Shi
Thanks to discuss this problem again. 1. Google checkstyle is good for java. 2. scala check style is here [1] 3. We can make a Flink plan contain issues, one sub-issue one rule. Resolve this in short time. Current code style may be historical accumulate. If we don’t normalize the code step by

Re: FLINK-4565 Support for SQL IN operator

2017-02-23 Thread Fabian Hueske
Hi Dmytro, done. Looking forward to your contribution! Cheers, Fabian 2017-02-23 17:25 GMT+01:00 Dmytro Shkvyra : > Hello, > > > > I would like to start contribute to Flink. > > Could anyone assign issue https://issues.apache.org/jira/browse/FLINK-4565 > to me

FLINK-4565 Support for SQL IN operator

2017-02-23 Thread Dmytro Shkvyra
Hello, I would like to start contribute to Flink. Could anyone assign issue https://issues.apache.org/jira/browse/FLINK-4565 to me (dshkvyra) in jira? Sincerely Dmytro Shkvyra Senior Software Engineer Office: +380 44 390 5457 x 65346 Cell: +380 50 357

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
@Theodore, thanks for taking lead in the coordination :) Let's see what we can do, and then decide what should start out as an independent project, or strictly inside Flink. I agree that something experimental like batch ML on streaming would probably benefit more an independent repo first.

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Sure having a deadline for March 3rd is fine. I can act as coordinator, trying to guide the discussion to concrete results. For committers it's up to their discretion and time if one wants to participate. I don't think it's necessary to have one, but it would be most welcome. @Katherin I would

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, let's just aim for around the end of next week, but we can take more time to discuss if there's still a lot of ongoing activity. Keep the topic hot! Thanks all for the enthusiasm :) On 2017-02-23 16:17, Stavros Kontopoulos wrote: @Gabor 3rd March is ok for me. But maybe giving a bit

Re: [DISCUSS] Code style / checkstyle

2017-02-23 Thread Aljoscha Krettek
If we go for a codestyle/checkstyle I would suggest to use the Google style. This already has checkstyle, IntelliJ style, Eclipse style and a code formatting tool and is well established. However, some people will not like this style. In general, I think we will never manage to find a style that

[jira] [Created] (FLINK-5896) Improve readability of the event time docs

2017-02-23 Thread David Anderson (JIRA)
David Anderson created FLINK-5896: - Summary: Improve readability of the event time docs Key: FLINK-5896 URL: https://issues.apache.org/jira/browse/FLINK-5896 Project: Flink Issue Type:

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
@Gabor 3rd March is ok for me. But maybe giving a bit more time to it like a week may suit more people. What do you think all? I will contribute to the doc. +100 for having a co-ordinator + commiter. Thank you all for joining the discussion. Cheers, Stavros On Thu, Feb 23, 2017 at 4:48 PM,

[jira] [Created] (FLINK-5895) Reduce logging aggressiveness of FileSystemSafetyNet

2017-02-23 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5895: --- Summary: Reduce logging aggressiveness of FileSystemSafetyNet Key: FLINK-5895 URL: https://issues.apache.org/jira/browse/FLINK-5895 Project: Flink Issue Type:

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-23 Thread Gyula Fóra
Hi, Thanks for the nice proposal, I like the idea of side outputs, and it would make a lot of topologies much simpler. Regarding the API I think we should come up with a way of making side otuputs accessible from all sort of operators in a similar way. For instance through the RichFunction

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, I've created a skeleton of the design doc for choosing a direction: https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing Much of the pros/cons have already been discussed here, so I'll try to put there all the arguments mentioned in this thread.

[DISCUSS] Side Outputs and Split/Select

2017-02-23 Thread Aljoscha Krettek
Hi Folks, Chen and I have been working for a while now on making FLIP-13 (side outputs) [1] a reality. We think we have a pretty good internal implementation and also a proposal for an API but now we need to discuss how we want to go forward with this, especially how we should deal with

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I have asked already some teams for useful cases, but all of them need time to think. During analysis something will finally arise. May be we can ask partners of Flink for cases? Data Artisans got results of customers survey: [1], ML better support is wanted, so we could ask what exactly is

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
+100 for a design doc. Could we also set a roadmap after some time-boxed investigation captured in that document? We need action. Looking forward to work on this (whatever that might be) ;) Also are there any data supporting one direction or the other from a customer perspective? It would help

Re: [DISCUSS] Project build time and possible restructuring

2017-02-23 Thread Stephan Ewen
If we can get a incremental builds to work, that would actually be the preferred solution in my opinion. Many companies have invested heavily in making a "single repository" code base work, because it has the advantage of not having to update/publish several repositories first. However, the

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
I agree, that it's better to go in one direction first, but I think online and offline with streaming API can go somewhat parallel later. We could set a short-term goal, concentrate initially on one direction, and showcase that direction (e.g. in a blogpost). But first, we should list the

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I'm not sure that this is feasible, doing all at the same time could mean doing nothing I'm just afraid, that words: we will work on streaming not on batching, we have no commiter's time for this, mean that yes, we started work on FLINK-1730, but nobody will commit this work in the end, as it

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Hello all, @Gabor, we have discussed the idea of using the streaming API to write all of our ML algorithms with a couple of people offline, and I think it might be possible and is generally worth a shot. The approach we would take would be close to Vowpal Wabbit, not exactly "online", but rather

Re: [DISCUSS] Per-key event time

2017-02-23 Thread Gábor Hermann
Hey all, Let me share some ideas about this. @Paris: The local-only progress tracking indeed seems easier, we do not need to broadcast anything. Implementation-wise it is easier, but performance-wise probably not. If one key can come from multiple sources, there could be a lot more network