Re: Java Maps and Type Information

2016-03-01 Thread Simone Robutti
I tried to simplify it to the bones but I'm actually defining a custom MapFunction,java.util.Map> that even with a simple identity function fails at runtime giving me the following error: >Exception in thread "main"

Windows, watermarks, and late data

2016-03-01 Thread Michael Radford
I'm evaluating Flink for a reporting application that will keep various aggregates updated in a database. It will be consuming from Kafka queues that are replicated from remote data centers, so in case there is a long outage in replication, I need to decide what to do about windowing and late

Re: Multi-dimensional[more than 2] input for KMeans Clustering inApache flink

2016-03-01 Thread subash basnet
Hello Fabian, Thanks! Is KMeans only the clustering implementation currently existing in flink. Best Regards, Subash Basnet On Tue, Mar 1, 2016 at 5:22 PM, Fabian Hueske wrote: > [image: Boxbe] This message is eligible > for Automatic

Re: Multi-dimensional[more than 2] input for KMeans Clustering in Apache flink

2016-03-01 Thread Fabian Hueske
Hi Subash, the KMeans implementation in Flink is meant to be a simple toy example and should not used for serious analysis tasks. It shows how the DataSet API works by implementing a well-known algorithm. Nonetheless, the example can be easily extended to work for three or more dimensions. You

Multi-dimensional[more than 2] input for KMeans Clustering in Apache flink

2016-03-01 Thread subash basnet
Hello all, Currently I find only two-dimension input possible for the KMeans Clustering in flink. Is there any implementation already or what should be the approach to implement more than 2 dimensional input for KMeans in flink? Or is there any other clustering method which taking more than two

Re: Java Maps and Type Information

2016-03-01 Thread Aljoscha Krettek
Hi, what kind of program are you writing? I just wrote a quick example using the DataStream API where I’m using Map> as the output type of one of my MapFunctions. Cheers, Aljoscha > On 01 Mar 2016, at 16:33, Simone Robutti wrote: >

Re: Iterations problem in command line

2016-03-01 Thread Marcela Charfuelan
Hi, the iteration looks like: DataSet gmms = getInitialGMMDataSet(env); IterativeDataSet loop = gmms.iterate(50); DataSet newGMMs = features.map(new Estep_ExpectationMaximisation()).withBroadcastSet(loop, "gmms") .reduceGroup(new

Re: Flink Streaming - WriteAsText

2016-03-01 Thread Maximilian Michels
Hey Ankur, If the output after cancelling the job is correct, I assume the changes haven't been flushed to disk before. For further investigation, could you share some code with us? Cheers, Max On Wed, Feb 24, 2016 at 8:54 PM, Ankur Sharma wrote: > Hey, > > I am

Re: Iterations problem in command line

2016-03-01 Thread Fabian Hueske
Yes, env.setParallelism(1) fixes the parallelism of all operators to 1 (unless an operator overrides this setting). Can you identify at which position in the data flow the results start to diverge? Best, Fabian 2016-02-29 17:57 GMT+01:00 Marcela Charfuelan : >