Re: How to create a stream of data batches

2015-09-07 Thread Juan Rodríguez Hortalá
Hi, I'm just a Flink newbie, but maybe I'd suggest using window operators with a Count policy for that https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#window-operators Hope that helps. Greetings, Juan 2015-09-04 14:14 GMT+02:00 Stephan Ewen : > Interest

Re: How to create a stream of data batches

2015-09-04 Thread Stephan Ewen
Interesting question, you are the second to ask that. Batching in user code is a way, as Matthias said. We have on the roadmap a way to transform a stream to a set of batches, but it will be a bit until this is in. See https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Str

Re: How to create a stream of data batches

2015-09-04 Thread Matthias J. Sax
Hi Andres, you could do this by using your own data type, for example > public class MyBatch { > private ArrayList data = new ArrayList > } In the DataSource, you need to specify your own InputFormat that reads multiple tuples into a batch and emits the whole batch at once. However, be aware,

How to create a stream of data batches

2015-09-04 Thread Andres R. Masegosa
Hi, I'm trying to code some machine learning algorithms on top of flink such as a variational Bayes learning algorithms. Instead of working at a data element level (i.e. using map transformations), it would be far more efficient to work at a "batch of elements" levels (i.e. I get a batch of elemen