[GitHub] flink issue #3428: [FLINK-1743] Add multinomial logistic regression to machi...

2017-10-03 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/3428 Hello @bowenli86, AFAIK nobody's working on the PR's right now. I'm currently 100% focused on finishing my PhD so I won't have time to review. ---

[GitHub] flink issue #3428: [FLINK-1743] Add multinomial logistic regression to machi...

2017-04-20 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/3428 Thank you for the info! In that case it's a bit ambiguous now which issue this PR is solving. If you are introducing an interface for GLMs we should be having this discussion under

[GitHub] flink issue #3428: [FLINK-1743] Add multinomial logistic regression to machi...

2017-04-18 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/3428 Hello @mtunique and thank you for you contribution! Could you briefly describe your changes/additions and how you have tested the implementation for correctness? --- If your project is

[GitHub] flink issue #3313: [FLINK-5588][ml] add a data normalizer to ml library

2017-02-15 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/3313 Hello @skonto thanks for your contribution! I'm currently snowed under paper deadlines, so I can't give you a time for when I'll be able to go through this, hopefully with

[GitHub] flink issue #1849: [FLINK-2157] [ml] Create evaluation framework for ML libr...

2017-01-20 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/1849 Hello @gaborhermann. Personally I prefer to have PRs be as specific as possible, so I would recommend we try to get this merged before #2838, and then rebase that on master. Given the

[GitHub] flink pull request #1849: [FLINK-2157] [ml] Create evaluation framework for ...

2017-01-17 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1849#discussion_r96406269 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/Score.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache

[GitHub] flink issue #1849: [FLINK-2157] [ml] Create evaluation framework for ML libr...

2017-01-17 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/1849 Hello @skonto this PR will probably be subsumed by #2838, you can check out the latest development there. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink issue #757: [FLINK-2131][ml]: Initialization schemes for k-means clust...

2017-01-17 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/757 Sure @sachingoel0101 feel free to split up the PRs to reduce overhead. For added initialization schemes let me throw [this recent NIPS](https://papers.nips.cc/paper/6478-fast-and-provably

[GitHub] flink issue #2761: [FLINK-3551] [examples] Sync Scala Streaming Examples

2016-12-14 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2761 Hello @ch33hau, sorry for the late reply, I've been at a conference the past week. With the latest changes this LGTM, I've edited the fix version in JIRA to 1.2.0 to give this more visi

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-12-06 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r91119867 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/SessionWindowing.scala --- @@ -60,11

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-12-06 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r91119955 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/SessionWindowing.scala --- @@ -77,9

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-12-06 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r91119626 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/GroupedProcessingTimeWindowExample.scala

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-12-06 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r91119469 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-25 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89645498 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/kafka/ReadFromKafka.scala --- @@ -0,0 +1,72

[GitHub] flink issue #2838: [FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & e...

2016-11-25 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2838 > The problem is not with the evaluate(test: TestType): DataSet[Double] but rather with evaluate(test: TestType): DataSet[(Prediction,Prediction)]. Completely agree there, I advocated

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89471075 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/GroupedProcessingTimeWindowExample.scala

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89469773 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala --- @@ -0,0

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89465547 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/kafka/WriteIntoKafka.scala --- @@ -0,0 +1,81

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89464880 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/kafka/ReadFromKafka.scala --- @@ -0,0 +1,72

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89472254 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/WindowWordCount.scala --- @@ -0,0

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89471865 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/SessionWindowing.scala --- @@ -0,0

[GitHub] flink pull request #2761: [FLINK-3551] [examples] Sync Scala Streaiming Exam...

2016-11-24 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2761#discussion_r89469371 --- Diff: flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala --- @@ -0,0

[GitHub] flink issue #2764: [FLINK-5008] Update quickstart documentation

2016-11-24 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2764 Hello @NicoK , do you think you can include [FLINK-5087](https://issues.apache.org/jira/browse/FLINK-5087) in this PR, or should we create a new one? --- If your project is set up for it, you can

[GitHub] flink issue #2819: [FLINK-4961] [ml] SGD for Matrix Factorization (WIP)

2016-11-23 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2819 Hello @gaborhermann, I really like the idea of introducing a `MatrixFactorization` interface that we can then use for different specialized optimization algorithms. For the question I&#

[GitHub] flink issue #2838: [FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & e...

2016-11-23 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2838 Hello Gabor, I like the idea of having a RankingScore, it seems like having that hierarchy with Score, RankingScore and PairWiseScore gives us the flexibility we need to include ranking

[GitHub] flink pull request #2838: [FLINK-4712] [FLINK-4713] [ml] Ranking recommendat...

2016-11-23 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2838#discussion_r89292988 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala --- @@ -72,14 +77,142 @@ trait Predictor[Self] extends

[GitHub] flink issue #2740: [FLINK-4964] [ml]

2016-11-22 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2740 Hello @tfournier314, I should have clarified for documentation I meant apart from the docstrings you have added now, we also have to include documentation in the Flink [docs](https://github.com

[GitHub] flink issue #2838: [FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & e...

2016-11-21 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2838 Hello @gaborhermann, thanks for making the PR! I'll try to take a look this week, I've been busy with a couple of other PRs these days. --- If your project is set up for i

[GitHub] flink issue #2740: [FLINK-4964] [ml]

2016-11-21 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2740 Hello @tfournier314, This PR is still missing documentation. After that is done a project committer will have to review it before it gets merged, which might take a while. Regards

[GitHub] flink issue #2764: [FLINK-5008] Update quickstart documentation

2016-11-17 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2764 IIRC @vasia is using Eclipse to develop, maybe she can chime in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] flink issue #2761: [FLINK-3551] [examples] Sync Scala Streaiming Examples

2016-11-16 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2761 Hello Lim, thank you for your contribution! I've taken a quick look and most of these look fine, plus I see you've included the required tests. I'll do a review this wee

[GitHub] flink issue #2740: [FLINK-4964] [ml]

2016-11-15 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2740 @greghogan Excuse my ignorance, I'm only now learning about Flink internals :) It seems like the issue here was that `partitionByRange` partitions keys in ascending order but we want th

[GitHub] flink issue #2740: [FLINK-4964] [ml]

2016-11-15 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2740 Hello @tfournier314 I tested your code and it does seem that partitions are sorted only internally, which is expected and `zipWithIndex` is AFAIK unaware of the sorted (as in key range) order

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-11-10 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r87422110 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -273,6 +308,14 @@ object ALS { val

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-11-10 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r87421513 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -675,7 +756,69 @@ object ALS

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-11-09 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r87202604 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -675,7 +756,69 @@ object ALS

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-11-09 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r87199446 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -273,6 +308,14 @@ object ALS { val

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-11-09 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r87195483 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -156,6 +171,26 @@ class ALS extends Predictor[ALS

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-11-09 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r87201508 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -535,8 +581,17 @@ object ALS { itemOut

[GitHub] flink pull request #2740: [FLINK-4964] [ml]

2016-11-09 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2740#discussion_r87192044 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StringIndexer.scala --- @@ -0,0 +1,108 @@ +package

[GitHub] flink issue #2542: [FLINK-4613] [ml] Extend ALS to handle implicit feedback ...

2016-11-09 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2542 @gaborhermann Yup the approach taken by the Spark community for testing is closer to what we would like to have for non-deterministic algorithms, but what you have implemented now should suffice on

[GitHub] flink issue #2542: [FLINK-4613] [ml] Extend ALS to handle implicit feedback ...

2016-11-09 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2542 Hello @gaborhermann, Yes I think you are right in that respect, just wanted to note that we should perform some comparative benchmarks in the future. So the benchmarks look good

[GitHub] flink issue #2542: [FLINK-4613] [ml] Extend ALS to handle implicit feedback ...

2016-11-08 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2542 Thank you @jfeher! Could you clarify what you mean by filtering the data to get unique item-user pairs? Is this because the iALS algorithm only supports binary interactions (i.e. number of

[GitHub] flink issue #2684: Add EvaluateDataSet Operation for LabeledVector - This cl...

2016-11-03 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2684 Hello @tillrohrmann I think you make some valid points. My original idea was to move completely to using `LabeledVector` to the de facto format for supervised learning problems and do away with

[GitHub] flink issue #2735: [FLINK-2094] implements Word2Vec for FlinkML

2016-11-03 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2735 Thank you for your contribution Kalman! I just took a brief look, this is a big PR so will probably take some time to review. For now a few things that jump to mind

[GitHub] flink issue #2682: [FLINK-4886] [docs] Update ML quickstart loading svm test...

2016-10-28 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2682 Thank you @ch33hau I'll also mark the issue as a duplicate! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] flink issue #2684: Add EvaluateDataSet Operation for LabeledVector - This cl...

2016-10-27 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2684 One last thing, could you change this PR's title to `[FLINK-4865] [ml] Add EvaluateDataSet operation for LabeledVector` This is the standard naming for PRs, issue number, follow

[GitHub] flink issue #2684: Add EvaluateDataSet Operation for LabeledVector - This cl...

2016-10-27 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2684 > We don't need to care about Jenkins/Travis fails, do we ? Normally we should, but there many unstable tests and issues with Travis that it's hard to follow now. You sho

[GitHub] flink issue #2684: Add EvaluateDataSet Operation for LabeledVector - This cl...

2016-10-25 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2684 Thank you for your contribution Thomas! The changes look mostly fine I think, my only concern is the need to cast the types in lines [215-216](https://github.com/apache/flink/pull/2684

[GitHub] flink issue #2668: Add EvaluateDataSetOperation for LabeledVector. This clos...

2016-10-21 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2668 Hello Thomas, thank you for your contribution! I took a brief look so some initial comments: This seems to be making changes to `MLUtils` which AFAIK is outside the scope of this

[GitHub] flink pull request #2658: [FLINK-4850] [ml] FlinkML - SVM predict Operation ...

2016-10-19 Thread thvasilo
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/2658 [FLINK-4850] [ml] FlinkML - SVM predict Operation for Vector and not LaveledVector The current version of the quickstart guide includes erroneous code, this changes the function calls to have

[GitHub] flink issue #1849: [FLINK-2157] [ml] Create evaluation framework for ML libr...

2016-10-04 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/1849 @gaborhermann In terms of missing features, documentation is definitely missing, as @rawkintrevo mentioned. For the issues mentioned in the JIRA issue you linked I've replied on the

[GitHub] flink issue #2542: [FLINK-4613] [ml] Extend ALS to handle implicit feedback ...

2016-09-29 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2542 @gaborhermann I think have a larger scale test would boost our confidence in the implementation, and maybe point out some problems that do not manifest with small data, which is very common

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-09-29 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r81159482 --- Diff: flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/recommendation/ImplicitALSTest.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed

[GitHub] flink pull request #2542: [FLINK-4613] [ml] Extend ALS to handle implicit fe...

2016-09-29 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r81159171 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -581,6 +637,16 @@ object ALS { val

[GitHub] flink pull request #2542: [FLINK-4613] Extend ALS to handle implicit feedbac...

2016-09-28 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r80860723 --- Diff: docs/dev/libs/ml/als.md --- @@ -99,6 +114,26 @@ The alternating least squares implementation can be controlled by the following

[GitHub] flink pull request #2542: [FLINK-4613] Extend ALS to handle implicit feedbac...

2016-09-28 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r80862241 --- Diff: flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/recommendation/ImplicitALSTest.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed

[GitHub] flink pull request #2542: [FLINK-4613] Extend ALS to handle implicit feedbac...

2016-09-28 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r80861067 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala --- @@ -581,6 +637,16 @@ object ALS { val

[GitHub] flink pull request #2542: [FLINK-4613] Extend ALS to handle implicit feedbac...

2016-09-28 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r80862018 --- Diff: flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/recommendation/ImplicitALSTest.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed

[GitHub] flink pull request #2542: [FLINK-4613] Extend ALS to handle implicit feedbac...

2016-09-28 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r80860057 --- Diff: docs/dev/libs/ml/als.md --- @@ -49,6 +49,21 @@ By applying this step alternately to the matrices $U$ and $V$, we can iterativel The

[GitHub] flink pull request #2542: [FLINK-4613] Extend ALS to handle implicit feedbac...

2016-09-28 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/2542#discussion_r80860320 --- Diff: docs/dev/libs/ml/als.md --- @@ -99,6 +114,26 @@ The alternating least squares implementation can be controlled by the following

[GitHub] flink issue #2542: [FLINK-4613] Extend ALS to handle implicit feedback datas...

2016-09-28 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2542 Hello @gaborhermann thank you for your contribution! Are the numbers here non-zero entries in a matrix? If that is the case do you think it would be possible to test this on some larger

[GitHub] flink pull request #2393: [trivial] Fix typo in dosctring

2016-08-19 Thread thvasilo
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/2393 [trivial] Fix typo in dosctring There is a small typo in the getBufferTimeout docstring, this fixes it. Talked with @tillrohrmann about whether it's worth it to open a PR for some

[GitHub] flink issue #1220: [FLINK-1745] Add exact k-nearest-neighbours algorithm to ...

2016-07-26 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/1220 Oh, sorry should have checked the master docs. On Jul 26, 2016 5:29 AM, "Chiwan Park" wrote: > Hi @thvasilo <https://github.com/thvasilo>, I've checked the

[GitHub] flink issue #1220: [FLINK-1745] Add exact k-nearest-neighbours algorithm to ...

2016-07-25 Thread thvasilo
Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/1220 Hello Daniel, sorry to bring this up months later,but I see that while the documentation exists, there is nothing linking to it from the FlinkML index page. Would you care to create a new PR

[GitHub] flink pull request: [FLINK-1979] Add logistic loss, hinge loss and...

2016-05-17 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1985#discussion_r63519957 --- Diff: flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/optimization/RegularizationPenaltyITSuite.scala --- @@ -0,0 +1,65

[GitHub] flink pull request: [FLINK-1979] Add logistic loss, hinge loss and...

2016-05-17 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1985#discussion_r63519390 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/optimization/RegularizationPenalty.scala --- @@ -0,0 +1,215

[GitHub] flink pull request: [FLINK-1979] Add logistic loss, hinge loss and...

2016-05-17 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1985#discussion_r63517610 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala --- @@ -272,7 +272,7 @@ abstract class

[GitHub] flink pull request: [FLINK-1979] Add logistic loss, hinge loss and...

2016-05-17 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1985#discussion_r63517587 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala --- @@ -272,7 +272,7 @@ abstract class

[GitHub] flink pull request: [FLINK-1979] Lossfunctions

2016-05-11 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/656#issuecomment-218399619 Hello @skavulya thank you for your contribution, I'm looking forward to the PR! Re. the regularization penalty, we it changed to make user choice easier,

[GitHub] flink pull request: [FLINK-1827] and small fixes in some tests

2016-04-27 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1915#issuecomment-215201773 Any idea what might be causing the error in FlinkML? I can't pinpoint it to any specific source file. --- If your project is set up for it, you can reply to

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-21 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212977748 I did some testing and I think the problem has to do with the types that each scaler expects. `StandardScaler` has fit and transform operations for `DataSets

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-21 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212942014 Well breeze was recently bumped to 0.12 #1876, maybe that has something to do with it, but let's see. Any chance you can try with the prev. Breeze ve

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-21 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-212793430 Hello Trevor, Thanks for taking the time to look at this, I'll investigate these issues today hopefully. -- Sent from a mobile device

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1849#issuecomment-205214498 @mbalassi @tillrohrmann Closed the previous PR and opened this one for the evaluation framework, as I had some issues with rebasing. --- If your project is

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-04 Thread thvasilo
Github user thvasilo closed the pull request at: https://github.com/apache/flink/pull/871 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/871#issuecomment-205214045 Moving this to #1849 due to git eating my changes when rebasing under the new dir structure. --- If your project is set up for it, you can reply to this email and have

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-04 Thread thvasilo
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/1849 [FLINK-2157] [ml] Create evaluation framework for ML library Using this PR instead of #871 due to rebase issues. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-03-29 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/871#issuecomment-203075630 @rawkintrevo AFAIK it's lack of time from a commiter to review it. If @tillrohrmann can find some time to review this I'll refactor it to get rid of the con

[GitHub] flink pull request: [FLINK-3464] [docs] Add SBT template documenta...

2016-02-24 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1688#issuecomment-188278485 In terms of usability I think it is.The whole point of the quickstart guide is to get users up and running as quickly as possible. For their first toy project people

[GitHub] flink pull request: [FLINK-3464] [docs] Add SBT template documenta...

2016-02-24 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/1688#issuecomment-188258580 Thanks for this and fixing the SBT build, this should greatly help setting up new users in demonstrations etc. @tillrohrmann Any reason giter8 was selected as

[GitHub] flink pull request: [FLINK-3330] [ml] Fix SparseVector support in ...

2016-02-04 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1587#discussion_r51910120 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala --- @@ -192,10 +190,18 @@ abstract class

[GitHub] flink pull request: [FLINK-3330] [ml] Fix SparseVector support in ...

2016-02-04 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/1587#discussion_r51909968 --- Diff: flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/regression/RegressionData.scala --- @@ -27,6 +27,21 @@ object RegressionData

[GitHub] flink pull request: [FLINK-2003] [docs] Building on some encrypted...

2015-09-07 Thread thvasilo
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/1100 [FLINK-2003] [docs] Building on some encrypted filesystems leads to "File name too long" error Replaces #690, adding docs instead of changing the pom files. You can merge this pull requ

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-05 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137936053 I agree, let's not break the API in this PR. We can create an issue and have a small discussion on the list about the change, and if the community agrees we can

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-13090 If it is actually considered to be API breaking then we should open a new PR and JIRA to make sure the change is well documented. Let's see what Till thinks and

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137775231 Also @tillrohrmann : Is changing the DataSetUtils structure considered an API breaking change and should we handle it differently? Any code using `import

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137772917 I think it would be hard to do otherwise. `java.lang.Double`'s are objects but `scala.Double`'s are not from what I understand. In this case duplicating the

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137768373 Re 1: Good change, implicit classes is the way to go here, no need for implicit conversion in the object. --- If your project is set up for it, you can reply to this

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137764885 ```scala def createDiscreteHistogram: DataSet[DiscreteHistogram] = { wrap(jutils.DataSetUtils.createDiscreteHistogram( self.map(x =>

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137759354 @sachingoel0101 That sounds like a good place for the docs, that `zip_elements_guide.mb` is linked from the DataSet Transformations doc, so replacing that link to a

[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-09-04 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-137714631 Hello, my 2c: This PR should include docs and a wrapper for the Scala API. We can also do this with a separate issue but it would be best if we merge as a more complete

[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-22 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-133675023 It's great to have this in, I'll try to update the cross-validation and SGD to use this. --- If your project is set up for it, you can reply to this emai

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-19 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132552269 Sounds good, just make sure to update the title and description of both PRs to reflect the current state --- If your project is set up for it, you can reply to this

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132261492 Sorry to be a nitpick, but if we are going to split the PR then the documentation should be split accordingly, we can merge the column-wise statistics once this one is

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132192126 Hello, I was wondering when the column-wise statistics were added to this PR and whether it makes sense to add them here, or create a new issue and PR for them

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132146208 Sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132131366 I would vote for returning a continuous histogram as the default and perhaps provide a discrete one as private to the ml package. --- If your project is set up for it

[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-08-14 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/696#issuecomment-131107449 +1 for closing this and focusing on approximate kNN instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-14 Thread thvasilo
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/949#discussion_r37054584 --- Diff: flink-tests/src/test/scala/org/apache/flink/api/scala/operators/SampleITCase.scala --- @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache

  1   2   3   4   >