Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Yes, you're right. This is not a principled solution but rather a work-around for a specific use case. The ReduceFunction must be used in the right way and it is easy to get wrong. (OTOH, there is currently no way to get object reusage right. So think the change would not worsen the current

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Aljoscha Krettek
You can go ahead and do the change. I just think that this is quite fragile. For example, this depends on the reduce function returning the right object for reuse. If we hand in the copied object as the first input and the ReduceFunction reuses the second input then we again have a reference to

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Hi Aljoscha, sure, there many issues with holding the state as objects on the heap. However, I think we don't have to solve all problems related to that in order to add a small fix that solves one specific issue. I would not explicitly expose the fix to users but it would be nice if we could

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread sjk
hi,Fabian Hueske, Sorry for mistake for the whole PR #2792 > On Nov 23, 2016, at 17:10, Fabian Hueske wrote: > > Hi, > > Why do you think that this means "much code changes"? > I think it would actually be a pretty lightweight change in > HeapReducingState. > > The proposal

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Hi, Why do you think that this means "much code changes"? I think it would actually be a pretty lightweight change in HeapReducingState. The proposal is to copy the *first* value that goes into a ReducingState. The copy would be done by a TypeSerializer and hence be a deep copy. This will allow

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-22 Thread sjk
Hi, Fabian So much code changes. Can you show us the key changes code for the object copy? Object reference maybe hold more deep reference, it can be a bomb. Can we renew a object with its data or direct use kryo for object serialization? I’m not prefer object copy. > On Nov 22, 2016, at

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-22 Thread Fabian Hueske
Does anybody have objections against copying the first record that goes into the ReduceState? 2016-11-22 12:49 GMT+01:00 Aljoscha Krettek : > That's right, yes. > > On Mon, 21 Nov 2016 at 19:14 Fabian Hueske wrote: > > > Right, but that would be a much

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-22 Thread Aljoscha Krettek
That's right, yes. On Mon, 21 Nov 2016 at 19:14 Fabian Hueske wrote: > Right, but that would be a much bigger change than "just" copying the > *first* record that goes into the ReduceState, or am I missing something? > > > 2016-11-21 18:41 GMT+01:00 Aljoscha Krettek

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Fabian Hueske
Right, but that would be a much bigger change than "just" copying the *first* record that goes into the ReduceState, or am I missing something? 2016-11-21 18:41 GMT+01:00 Aljoscha Krettek : > To bring over my comment from the Github PR that started this discussion: > >

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Aljoscha Krettek
To bring over my comment from the Github PR that started this discussion: @wuchong , yes this is a problem with the HeapStateBackend. The RocksDB backend does not suffer from this problem. I think in the long run we should migrate the HeapStateBackend to always keep

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Fabian Hueske
Another approach that would solve the problem for our use case (object re-usage for incremental window ReduceFunctions) would be to copy the first object that is put into the state. This would be a change on the ReduceState, not on the overall state backend, which should be feasible, no?

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Stephan Ewen
-1 for copying objects. Storing a serialized data where possible is good, but copying all objects by default is not a good idea, in my opinion. A lot of scenarios use data types that are hellishly expensive to copy. Even the current copy on chain handover is a problem. Let's not introduce even

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Maciek Próchniak
Hi, it will come with performance overhead when updating the state, but I think it'll be possible to perform asynchronous snapshots using HeapStateBackend (probably some changes to underlying data structures would be needed) - which would bring more predictable performance. thanks, maciek

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Aljoscha Krettek
Hi, I would be in favour of this since it brings things in line with the RocksDB backend. This will, however, come with quite the performance overhead, depending on how fast the TypeSerializer can copy. Cheers, Aljoscha On Mon, 21 Nov 2016 at 11:30 Fabian Hueske wrote: > Hi

[DISCUSS] Hold copies in HeapStateBackend

2016-11-21 Thread Fabian Hueske
Hi everybody, when implementing a ReduceFunction for incremental aggregation of SQL / Table API window aggregates we noticed that the HeapStateBackend does not store copies but holds references to the original objects. In case of a SlidingWindow, the same object is referenced from different