A whole bag of ML issues

2016-03-28 Thread Trevor Grant
Hey, I have a working prototype of an multi layer perceptron implementation working in Flink. I made every possible effort to utilize existing code when possible. In the process of doing this there were some hacks I want/need, and think this should be broken up into multiple PRs and possible abs

Expected duration for cascading-flink tests?

2016-03-28 Thread Ken Krugler
Hi all, I'm curious how long the tests are expected to take for cascading-flink. I know that https://github.com/dataArtisans/cascading-flink recommends running mvn clean install with -DskipTests, but I was going to try updating to flink 1.0.0 (currently using 0.10.0) and cascading 3.1.0-wip-56

Multi-Layer Perceptron

2016-03-28 Thread Trevor Grant
Hey All, As a follow up to my earlier post on things to do for Flink ML, here is a technically working, though certainly not PR ready branch for neural networks. Meant as a visual aid to starting a conversation on road-map to some need updates. https://github.com/apache/flink/compare/master...raw

Re: a typical ML algorithm flow

2016-03-28 Thread Dmitriy Lyubimov
Thanks Chiwan. I think this example still creates a lazy-evaluated plan. And if i need to collect statistics to front end (and use it in subsequent iteration evaluation) as my example with computing column-wise averages suggests? problem generally is, what if I need to eagerly evaluate the statis