[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-10 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-129336972 Hi, I just discovered the review request. I'll review this PR soon. Because I'm busy in working for my graduation essay, maybe I can start reviewing on weekend. ---

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137570 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/MLUtils.scala --- @@ -119,4 +120,30 @@ object MLUtils { stringRepresentat

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137592 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137605 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137631 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137647 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137658 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137690 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137718 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137749 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137752 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137756 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137762 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137774 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137811 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137819 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137821 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137856 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137866 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37137965 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37138026 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/ContinuousHistogram.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37138063 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,108 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37138093 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,108 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131425060 Hi @sachingoel0101, Thanks for your contribution. I reviewed this PR and commented the source code. There are some problems which aren't commented. In document

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37141028 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the A

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37141049 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the A

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37141076 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,108 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Stat

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131474758 Hi @chiwanpark, thanks for reviewing this. This was my first time working in Scala, so I hope you'll forgive the slight mistakes (oversights, perhaps?). I've t

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37143932 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/FieldStats.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apach

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131509499 Hi @chiwanpark , I've fixed the *cosmetic* issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144066 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/MLUtils.scala --- @@ -119,4 +120,31 @@ object MLUtils { stringRepresentat

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144068 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/MLUtils.scala --- @@ -119,4 +120,31 @@ object MLUtils { stringRepresentat

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144073 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/MLUtils.scala --- @@ -119,4 +120,31 @@ object MLUtils { stringRepresentat

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131512180 @sachingoel0101 There is no need to forgive. I just reviewed this PR and left my opinions. :) If you feel that my comments is aggressive, I'm sorry about that.

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144130 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,100 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144137 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,100 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144173 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,100 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144171 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,100 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144175 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,100 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statisti

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144183 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144182 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144192 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37144195 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/statistics/CategoricalHistogram.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131514958 I found some points to improve and added line notes. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131518423 The occasional space issues are because I stopped using Idea's reformat tool. It sometimes messes up the indentations. :') And no. Your comments weren't aggres

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131520764 I just found that `Predef.assume` could be replaced by `assume`. Sorry for wrong guidance. Could you change `Predef.assume` to `assume`? --- If your project is set up

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-16 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-131523559 Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132116059 I think that we can merge this PR after we decide the return type of `createHistogram` method. Any other points seem okay. --- If your project is set up for it, you c

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132118352 What about having two different functions? One for Discrete and one for continuous? Or perhaps just one for the Continuous as that is more likely to be used.

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132128284 +1 for having two different methods by return type but we need more comments from @tillrohrmann, @thvasilo or other people because I'm not sure this is best approach.

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132131366 I would vote for returning a continuous histogram as the default and perhaps provide a discrete one as private to the ml package. --- If your project is set up for it,

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132134747 Okay. That sounds good enough. Can we make a final decision then? --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132137740 I would suggest, we have both functions public in the MLUtils class. Further, we provide a pimp-my-class function for only Continuous histogram. --- If your proje

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132146208 Sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132170286 @thvasilo , @chiwanpark , I've made the required changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub a

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132183402 Is there the valid reason to have multiple names ('discrete histogram', 'categorical histogram')? I think that to avoid confusion, we need to unify the names. --- If

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132186102 Ah yes. You're right. I will change the names. Which do you think is the better name? I'm in favor of Discrete. --- If your project is set up for it, you can repl

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132186504 I'm also inclined to use Discrete. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132187018 Olrite. I'll be back in a few. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project d

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132188989 @chiwanpark , modified the names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project d

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132192126 Hello, I was wondering when the column-wise statistics were added to this PR and whether it makes sense to add them here, or create a new issue and PR for them.

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132193111 They're not coupled at all. But both are related to statistics over data sets. This is why I combined them both in one. If you're wondering, there is a JIRA for th

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132208131 @thvasilo , @chiwanpark , I've split this into two parts. The statistics part is now in #1032 --- If your project is set up for it, you can reply to this email a

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132209035 I have kept the documentation for statistics in this however. We can merge #1032 first and then this to maintain the illusion of cause-effect. :') --- If your pro

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132261492 Sorry to be a nitpick, but if we are going to split the PR then the documentation should be split accordingly, we can merge the column-wise statistics once this one is m

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132264800 Haha. It's no problem. I've updated the documentation for this. :) --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132430686 The only problem to merge this is invalid example in documentation. We need calling `collect()` method and `apply(0)` to execute statistics methods such as `quantile`,

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132439907 Including the collect step in the documentation seems weird, IMO. Since the return type is a `DataSet[Histogram]`, anyone familiar with flink will know to collect

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-18 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132449023 Hi @chiwanpark , I have modified the documentation to include the collect step. I have also made a few modification to how the histogram is created by using a

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-19 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r37390659 --- Diff: docs/libs/ml/statistics.md --- @@ -0,0 +1,98 @@ +--- +mathjax: include +htmlTitle: FlinkML - Statistics +title: FlinkML - Statistic

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-19 Thread chiwanpark
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132494738 Good! But because there is another issue (FLINK-2379) covered the column-wise statistics, I think that spliting them would be better. --- If your project is set up fo

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-19 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132512002 Okay. Done. I will modify the `createContinuousHistogram` to use `DataStats` in #1032, since that is going to be merged after this anyway. --- If your proje

[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-19 Thread thvasilo
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132552269 Sounds good, just make sure to update the title and description of both PRs to reflect the current state --- If your project is set up for it, you can reply to this em