Re: MapState API reloaded

2022-03-28 Thread Jan Lukavský
value" means, unless you mean that it returns null. On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský wrote: Hi, I just foun

Re: MapState API reloaded

2022-03-23 Thread Kenneth Knowles
ch enforces various hacks to actually retrieve the computed (that is >>>>>> the >>>>>> current!) value. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Jan >>>>>> >>>>>> [1] >

Re: MapState API reloaded

2022-02-28 Thread Jan Lukavský
an that it returns null. On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský wrote: Hi, I just found another weirdness in the MapState API - method

Re: MapState API reloaded

2022-02-21 Thread Jan Lukavský
t modify the value unless it is not present, I'm not sure what the "old value" means, unless you mean that it returns null. On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský wrote:

Re: MapState API reloaded

2022-02-17 Thread Kenneth Knowles
t you mean. Do you mean it always returns >>> null? Since computeIfAbsent doesn't modify the value unless it is not >>> present, I'm not sure what the "old value" means, unless you mean that it >>> returns null. >>> >>> On Fri, Feb 11

Re: MapState API reloaded

2022-02-16 Thread Jan Lukavský
ll? Since computeIfAbsent doesn't modify the value unless it is not present, I'm not sure what the "old value" means, unless you mean that it returns null. On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský wrote: Hi,

Re: MapState API reloaded

2022-02-11 Thread je . ik
lue unless it is not present, I'm not sure what the "old value" means, unless you mean that it returns null.On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský <je...@seznam.cz> wrote:Hi, I just found another weirdness in the MapState API - method computeIfAbsent returns the _old

MapState API reloaded

2022-02-11 Thread Jan Lukavský
Hi, I just found another weirdness in the MapState API - method computeIfAbsent returns the _old_ value associated with the key. That is inconsistent with Java API and frankly with what users should expect when using this method. The semantics should be "retrieve current value, i

Re: MapState API

2021-09-13 Thread Ke Wu
gt;> that the timing of the actual write cannot be observed. >> >> Kenn >> >> On Tue, Aug 17, 2021 at 10:19 AM Luke Cwik > <mailto:lc...@google.com>> wrote: >> Yeah, the "or state is committed" means that we should resolve it before any >> additio

Re: MapState API

2021-09-10 Thread Ke Wu
@code read()} is called on the result or state is committed, it forces a > read of the map and reconciliation with any pending modifications." > > My reading of this is that the value changes for anything that happens after > the call to putIfAbsent. It would be good to make th

Re: MapState API

2021-08-17 Thread Kenneth Knowles
t;>> My reading of this is that the value changes for anything that happens >>>> after the call to putIfAbsent. It would be good to make this clear, that >>>> any subsequent observation of this cell should observe the new value. >>>> >>>> Kenn >>

Re: MapState API

2021-08-17 Thread Kenneth Knowles
t;> On Fri, Aug 13, 2021 at 9:32 AM Reuven Lax wrote: >> >>> Yeah - I remember thinking that the computeIfAbsent/putIfAbsent >>> semantics were very weird. I almost would have preferred not having those >>> methods in MapState, even though they can be useful.

Re: MapState API

2021-08-16 Thread Kenneth Knowles
weird. I almost would have preferred not having those methods in > MapState, even though they can be useful. > > On Fri, Aug 13, 2021 at 9:21 AM Luke Cwik wrote: > >> I was surprised to see the MapState API for computeIfAbsent/putIfAbsent >> only performs the write if the Re

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-28 Thread Kenneth Knowles
On Wed, Sep 16, 2020 at 8:48 AM Tyson Hamilton wrote: > The use case is to support an unbounded stream-stream join, where the > elements are arriving in roughly time sorted order. Removing a specific > element from the timestamp indexed collection is necessary when a match is > found. > Just che

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-16 Thread Tyson Hamilton
The use case is to support an unbounded stream-stream join, where the elements are arriving in roughly time sorted order. Removing a specific element from the timestamp indexed collection is necessary when a match is found. Having clearRange is helpful to expire elements that are no longer relevant

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-15 Thread Reuven Lax
Hi, Currently we only support removing a timestamp range. You can remove a single timestamp of course by removing [ts, ts+1), however if there are multiple elements with the same timestamp this will remove all of those elements. Does this fit your use case? If not, I wonder if MapState is closer

Re: [External] Re: DISCUSS: Sorted MapState API

2020-09-15 Thread Tyson Hamilton
Hi Reuven, I noticed that there was an implementation of the in-memory OrderedListState introduced [1]. Where can I find out more regarding the plan and design? Is there a design doc? I'd like to know more details about the implementation to see if it fits my use case. I was hoping it would hav

Re: [External] Re: DISCUSS: Sorted MapState API

2020-08-03 Thread Catlyn Kong
Hey folks, Sry I'm late to this thread but this might be very helpful for the problem we're dealing with. Do we have a design doc or a jira ticket I can follow? Cheers, Catlyn On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský wrote: > My questions were just an example. I fully agree there is a fund

Re: DISCUSS: Sorted MapState API

2020-06-18 Thread Jan Lukavský
My questions were just an example. I fully agree there is a fundamental need for a sorted state (of some form, and I also think this links to efficient implementation of retrations) - I was reacting to Kenn's question about BIP. This one would be pretty nice example why it would be good to have

Re: DISCUSS: Sorted MapState API

2020-06-18 Thread Reuven Lax
Jan - my proposal is exactly TimeSortedBagState (more accurately - TimeSortedListState), though I went a bit further and also proposed a way to have a dynamic number of tagged TimeSortedBagStates. You are correct that the runner doesn't really have to store the data time sorted - what's actually n

Re: DISCUSS: Sorted MapState API

2020-06-18 Thread Jan Lukavský
Big +1 for a BIP, as this might really help clarify all the pros and cons of all possibilities. There seem to be questions that need answering and motivating use cases - do we need sorted map state or can we solve our use cases by something simpler - e.g. the mentioned TimeSortedBagState? Does

Re: DISCUSS: Sorted MapState API

2020-06-17 Thread Kenneth Knowles
Zooming in from generic philosophy to be clear: adding time ordered buffer to the Fn state API is *not* a shortcut.It has benefits that will not be achieved by SDK-side implementation on top of either ordered or unordered multimap. Are those benefits worth expanding the API? I don't know. A change

Re: DISCUSS: Sorted MapState API

2020-06-17 Thread Reuven Lax
It might help for me to describe what I have in mind. I'm still proposing that we build multimap, just not a globally-sorted multimap. My previous proposal was that we provide a Multimap state type, sorted by key. this would have two additional operations - multimap.getRange(startKey, endKey) and

Re: DISCUSS: Sorted MapState API

2020-06-17 Thread Luke Cwik
The portability layer is meant to live across multiple versions of Beam and I don't think it should be treated by doing the simple and useful thing now since I believe it will lead to a proliferation of the API. On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles wrote: > I have thoughts on the subj

Re: DISCUSS: Sorted MapState API

2020-06-17 Thread Kenneth Knowles
I have thoughts on the subject of whether to have APIs just for the lowest-level building blocks versus having APIs for higher-level constructs. Specifically this applies to providing only unsorted multimap vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on time-ordered buffer;

Re: DISCUSS: Sorted MapState API

2020-06-16 Thread Luke Cwik
I would be glad to take a stab at how to provide sorting on top of unsorted multimap state. Based upon your description, you want integer keys representing timestamps and arbitrary user value for the values, is that correct? What kinds of operations do you need on the sorted map state in order of e

Re: DISCUSS: Sorted MapState API

2020-06-16 Thread Reuven Lax
Bringing this subject up again, I've spent some time looking into implementing this for the Dataflow runner. I'm unable to find a way to implement the arbitrary sorted multimap efficiently for the case where there are large numbers of unique keys. Since the primary driving use case is timestamp or

Re: DISCUSS: Sorted MapState API

2019-06-02 Thread Reuven Lax
This sounds to me like a potential runner strategy. However if a runner can natively support sorted maps (e.g. we expect the Dataflow runner to be able to do so, and I think it would be useful for other runners as well), then it's probably preferable to allow the runner to use its native capabiliti

Re: DISCUSS: Sorted MapState API

2019-05-30 Thread Kenneth Knowles
On Tue, May 28, 2019 at 2:59 AM Robert Bradshaw wrote: > On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote: > > > > On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote: > >> > >> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: > >>> > >>> Some great comments! > >>> > >>> Aljoscha: abso

Re: DISCUSS: Sorted MapState API

2019-05-28 Thread Robert Bradshaw
On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote: > > On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote: >> >> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: >>> >>> Some great comments! >>> >>> Aljoscha: absolutely this would have to be implemented by runners to be >>> efficient. W

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Lukasz Cwik
In my look for, it should have said "void". instead of "key". when explaining how to do it. On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik wrote: > For the API that you proposed, the map key is always "void" and the sort > key == user key. So in my example of > key: dummy value > key.000: token, (

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Lukasz Cwik
For the API that you proposed, the map key is always "void" and the sort key == user key. So in my example of key: dummy value key.000: token, (0001, value4) key.001: token, (0010, value1), (0011, value2) key.01: token key.1: token, (1011, value3) you would have: "void": dummy value "void".000: tok

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Reuven Lax
Can you explain how fetching and deleting ranges of keys would work with this data structure? On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik wrote: > Reuven, for the example, I assume that we never want to store more then 2 > values at a given sort key prefix, and if we do then we will create a new

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Kenneth Knowles
On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote: > > > On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: > >> Some great comments! >> >> *Aljoscha*: absolutely this would have to be implemented by runners to >> be efficient. We can of course provide a default (inefficient) >> implementatio

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Kenneth Knowles
On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: > Some great comments! > > *Aljoscha*: absolutely this would have to be implemented by runners to be > efficient. We can of course provide a default (inefficient) implementation, > but ideally runners would provide better ones. > > *Jan* Exactly.

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Reuven Lax
ingle key API might be implemented by allEntriesInc(k) > - allEntriesExcl(k). > It's a good question. In the Dataflow runner it would likely be the same implementation. If that is not true for other runners, then it's worth keeping both around. > > Also, I would assume MapState is impleme

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Lukasz Cwik
Reuven, for the example, I assume that we never want to store more then 2 values at a given sort key prefix, and if we do then we will create a new longer prefix splitting up the values based upon the sort key. Tuple representation in examples below is (key, sort key, value) and . is a character o

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Rui Wang
pState is implemented by some runners already. So will dropping MapState API means runners will change accordingly? Do I misunderstand? -Rui

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Reuven Lax
Some great comments! *Aljoscha*: absolutely this would have to be implemented by runners to be efficient. We can of course provide a default (inefficient) implementation, but ideally runners would provide better ones. *Jan* Exactly. I think MapState can be dropped or backed by this. E.g. *Robert

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Robert Bradshaw
On Fri, May 24, 2019 at 5:32 AM Reuven Lax wrote: > > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote: >> >> >> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote: >>> >>> >>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote: > > A few obvious problems with this code: > 1.

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Jan Lukavský
. 5. 2019 10:56:45 Předmět: Re: DISCUSS: Sorted MapState API " This is quite interesting! The Flink Table API (relational and SQL) has an implementation for the type of join you mention in the example. We call it Temporal Table Join, and it works on something we call Temporal Tables: 

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Aljoscha Krettek
This is quite interesting! The Flink Table API (relational and SQL) has an implementation for the type of join you mention in the example. We call it Temporal Table Join, and it works on something we call Temporal Tables: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Reuven Lax
On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote: > > > On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote: > >> >> >> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote: >> >>> A few obvious problems with this code: 1. Removing the elements already processed from the bag requires clea

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Reuven Lax
On Thu, May 23, 2019 at 11:23 AM Lukasz Cwik wrote: > I would suggest that we drop MapState and instead support MultimapState > without ordering as a first pass and potentially add ordering later. > While I agree that MultimapState is useful on its own, for these aggregation use cases ordering i

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Reza Rokni
+100 As someone who has just spent a lot of time coding all the "GC, caching of BagState fetches, etc" this would make life a lot easier! Its also generally valuable for a lot of timeseries work. On Fri, 24 May 2019 at 04:59, Lukasz Cwik wrote: > > > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Lukasz Cwik
On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote: > > > On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote: > >> >> >> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote: >> >>> A few obvious problems with this code: 1. Removing the elements already processed from the bag requires clea

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Ahmet Altay
On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote: > > > On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote: > >> A few obvious problems with this code: >>> 1. Removing the elements already processed from the bag requires >>> clearing and rewriting the entire bag. This is O(n^2) in the number of

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Lukasz Cwik
On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote: > A few obvious problems with this code: >> 1. Removing the elements already processed from the bag requires >> clearing and rewriting the entire bag. This is O(n^2) in the number of >> input trades. >> > why it's not O(2 * n) to clearing and rew

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Rui Wang
> > A few obvious problems with this code: > 1. Removing the elements already processed from the bag requires > clearing and rewriting the entire bag. This is O(n^2) in the number of > input trades. > why it's not O(2 * n) to clearing and rewriting trade state? > public interface SortedMultimap

Re: DISCUSS: Sorted MapState API

2019-05-23 Thread Lukasz Cwik
I would suggest that we drop MapState and instead support MultimapState without ordering as a first pass and potentially add ordering later. Inside an SDK we would be able to build a "sorted" multimap state that uses a prefix of the key and implement radix based sorting. Any time a prefix has too

DISCUSS: Sorted MapState API

2019-05-23 Thread Reuven Lax
Beam's state API is intended to be useful as an alternative aggregation method. Stateful ParDos can aggregate elements into state and set timers to read the state. Currently Beam's state API supports two ways of storing collections of elements: BagState and MapState. BagState is the one most commo