Re: Streaming KV store abstraction

2016-03-24 Thread Nam-Luc Tran
efaultValue)). > > Now that you have everything set up, in flatMap1 (for events) you would > query the state : state.value() and enrich your data > in flatMap2 you would update the state: state.update(newState) > > Does this make sense to you? Or is the use case completely different?

Re: Streaming KV store abstraction

2016-03-19 Thread Nam-Luc Tran
; > > > > > > > > > > > > > > > > > > > > > > > > 2015-09-08 13:35 GMT+02:00 Gyula Fóra : > > > > > > > > > > Hey All, > > > > > > > > > > > > > > > > > > > > The last couple of days I have been playing around with > the > > > > idea > > > > > of > > > > > > > > > > building a streaming key-value store abstraction using > > > stateful > > > > > > > > streaming > > > > > > > > > > operators that can be used within Flink Streaming > programs > > > > > > > seamlessly. > > > > > > > > > > > > > > > > > > > > Operations executed on this KV store would be fault > > tolerant > > > as > > > > > it > > > > > > > > > > integrates with the checkpointing mechanism, and if we > add > > > > > > timestamps > > > > > > > > to > > > > > > > > > > each put/get/... operation we can use the watermarks to > > > create > > > > > > fully > > > > > > > > > > deterministic results. This functionality is very useful > > for > > > > many > > > > > > > > > > applications, and is very hard to implement properly with > > > some > > > > > > > > dedicates > > > > > > > > > kv > > > > > > > > > > store. > > > > > > > > > > > > > > > > > > > > The KVStore abstraction could look as follows: > > > > > > > > > > > > > > > > > > > > KVStore store = new KVStore<>; > > > > > > > > > > > > > > > > > > > > Operations: > > > > > > > > > > > > > > > > > > > > store.put(DataStream>) > > > > > > > > > > store.get(DataStream) -> DataStream> > > > > > > > > > > store.remove(DataStream) -> DataStream> > > > > > > > > > > store.multiGet(DataStream) -> DataStream[]> > > > > > > > > > > store.getWithKeySelector(DataStream, KeySelector) > > -> > > > > > > > > > > DataStream[]> > > > > > > > > > > > > > > > > > > > > For the resulting streams I used a special KV abstraction > > > which > > > > > > let's > > > > > > > > us > > > > > > > > > > return null values. > > > > > > > > > > > > > > > > > > > > The implementation uses a simple streaming operator for > > > > executing > > > > > > > most > > > > > > > > of > > > > > > > > > > the operations (for multi get there is an additional > merge > > > > > > operator) > > > > > > > > with > > > > > > > > > > either local or partitioned states for storing the > > kev-value > > > > > pairs > > > > > > > (my > > > > > > > > > > current prototype uses local states). And it can either > > > execute > > > > > > > > > operations > > > > > > > > > > eagerly (which would not provide deterministic results), > or > > > > > buffer > > > > > > up > > > > > > > > > > operations and execute them in order upon watermarks. > > > > > > > > > > > > > > > > > > > > As for use cases you can probably come up with many I > will > > > save > > > > > > that > > > > > > > > for > > > > > > > > > > now :D > > > > > > > > > > > > > > > > > > > > I have a prototype implementation here that can execute > the > > > > > > > operations > > > > > > > > > > described above (does not handle watermarks and time > yet): > > > > > > > > > > > > > > > > > > > > https://github.com/gyfora/flink/tree/KVStore > > > > > > > > > > And also an example job: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/gyfora/flink/blob/KVStore/flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/KVStreamExample.java > > > > > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > If you like it I will work on writing tests and it still > > > needs > > > > a > > > > > > lot > > > > > > > of > > > > > > > > > > tweaking and refactoring. This might be something we want > > to > > > > > > include > > > > > > > > with > > > > > > > > > > the standard streaming libraries at one point. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- *Nam-Luc TRAN* R&D Manager EURA NOVA (M) +32 498 37 36 23 *euranova.eu <http://euranova.eu>*

Re: Playing with EventTime in DataStreams

2016-02-26 Thread Nam-Luc Tran
016-02-25 20:05 GMT+01:00 Robert Metzger : > Hi, > > I had a similar issue recently. > Instead of > input.assignTimestampsAndWatermarks > > you have to do: > > input = input.assignTimestampsAndWatermarks > > On Thu, Feb 25, 2016 at 6:14 PM, Nam-Luc Tran > wro

Playing with EventTime in DataStreams

2016-02-25 Thread Nam-Luc Tran
x27;ve configured the stream to do it. It however does not do that, and does the windows based on processing time. What am I missing here? Best regards, -- *Nam-Luc TRAN* R&D Manager EURA NOVA (M) +32 498 37 36 23 *euranova.eu <http://euranova.eu>*

Distributed DataFrame - ddf.io

2015-12-03 Thread Nam-Luc Tran
*Nam-Luc TRAN* R&D Manager EURA NOVA (M) +32 498 37 36 23 *euranova.eu <http://euranova.eu>*

Re: Error while deserializing event

2015-06-24 Thread Nam-Luc Tran
ve as buffers before they are deserialized. That's why you see the getNextBuffer call in the stack trace. – Ufuk On Tuesday, June 23, 2015, Nam-Luc Tran wrote: > Hello fellow Flinksters, > > I currently work on implementing Stale Synchronous Parallel iterations > from the current b

Error while deserializing event

2015-06-23 Thread Nam-Luc Tran
Hello fellow Flinksters, I currently work on implementing Stale Synchronous Parallel iterations from the current bulk iterations. I have replacement classes for IterationHeadPactTask, IterationSynchronizationTask and corresponding event handlers to do the job. Among the generated events, I have Cl

Re: Iteration stats logging

2015-06-15 Thread Nam-Luc Tran
is no "final" no-op iteration happening, the iteration tasks don't know when the last iteration happened. I'm not sure what the best way is to implement this at the moment. What kind of stats are you recording? – Ufuk On 15 Jun 2015, at 15:53, Nam-Luc Tran wrote: > Hello Ev

Iteration stats logging

2015-06-15 Thread Nam-Luc Tran
Hello Everyone, I would like to log certain stats during iterations in a bulk iterative job. The way I do this is store the things I want at each iteration and plan to flush everything to HDFS once all the iterations are done. To do that I would need to know when the last iteration is invoked in o

Re: Stale Synchronous Parallel iterations in Flink

2015-02-23 Thread Nam-Luc Tran
start their    next superstep if all "tails" have seen clock messages at least of its own clock time minus the slack. If you are looking to implement this in Flink, or dig deeper into this, let me know, I would be happy to help. On Fri, Feb 20, 2015 at 5:27 PM, Nam-Luc Tran wrote: >

Stale Synchronous Parallel iterations in Flink

2015-02-20 Thread Nam-Luc Tran
Hello Everyone,  I am Nam-Luc Tran, research Engineer at EURA NOVA [1]. Our research subjects cover distributed machine learning and we have been working on dataflow graph processing for a while now. We have been reading from you since Stratosphere :-) Our current research focuses on Stale

Re: AW: kryoException : Buffer underflow

2015-02-12 Thread Nam-Luc Tran
then uses Kryo for >>> serialization. >>>> In general, lambda expressions are a very new feature which currently >>>> makes a lot of problems due to missing type information by compilers. >>> Maybe >>>> it is better to use (anonymous) classes instead.

Re: kryoException : Buffer underflow

2015-02-11 Thread Nam-Luc Tran
Hello Stephan,  Thank you for your help. I ensured all the POJO classes used comply to what you previously said and the same exception occurs. Here is the listing of classes Centroid25 and Point25: public class Centroid25 extends Point25 { public int id; public Centroid25() {} public Centroid

kryoException : Buffer underflow

2015-02-11 Thread Nam-Luc Tran
Hello, I came accross an error for which I am unable to retrace the exact cause. Starting from flink-java-examples module, I have extended the KMeans example to a case where points have 25 coordinates. It follows the exact same structure and transformations as the original example, only with point

Re: Eclipse JDT, Java 8, lambdas

2015-02-09 Thread Nam-Luc Tran
I did try the 4.5 M4 release and it did not go straightforward. -- View this message in context: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Eclipse-JDT-Java-8-lambdas-tp3664p3688.html Sent from the Apache Flink (Incubator) Mailing List archive. mailing list archiv

Re: Eclipse JDT, Java 8, lambdas

2015-02-06 Thread Nam-Luc Tran
Thank you for your replies. @Stephen Updating to 0.9-SNAPSHOT and using the "return" statement did the trick. I will try the 4.5 M4 release and give a feedback on how it went. @Robert I launch the job right from the Eclipse IDE. Also, each file in the folder contains data for a different trajecto

Eclipse JDT, Java 8, lambdas

2015-02-06 Thread Nam-Luc Tran
Hello, I am trying to use Java 8 lambdas in my project and hit the following error: Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: The generic type parameters of 'Tuple2' are missing.  It seems that your compiler has not stored them into the .class file.