value" means, unless you mean that
it returns null.
On Fri, Feb 11, 2022 at 4:54 AM Jan
Lukavský wrote:
Hi,
I just foun
ch enforces various hacks to actually retrieve the computed (that is
>>>>>> the
>>>>>> current!) value.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>> [1]
>
an that it returns null.
On Fri, Feb 11, 2022 at 4:54 AM Jan
Lukavský wrote:
Hi,
I just found another weirdness in
the MapState API - method
t
modify the value unless it is not present, I'm not
sure what the "old value" means, unless you mean that
it returns null.
On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský
wrote:
t you mean. Do you mean it always returns
>>> null? Since computeIfAbsent doesn't modify the value unless it is not
>>> present, I'm not sure what the "old value" means, unless you mean that it
>>> returns null.
>>>
>>> On Fri, Feb 11
ll? Since computeIfAbsent doesn't modify the value
unless it is not present, I'm not sure what the "old value"
means, unless you mean that it returns null.
On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský
wrote:
Hi,
lue unless it is not present, I'm not sure what the "old value" means, unless you mean that it returns null.On Fri, Feb 11, 2022 at 4:54 AM Jan Lukavský <je...@seznam.cz> wrote:Hi,
I just found another weirdness in the MapState API - method
computeIfAbsent returns the _old
Hi,
I just found another weirdness in the MapState API - method
computeIfAbsent returns the _old_ value associated with the key. That is
inconsistent with Java API and frankly with what users should expect
when using this method. The semantics should be "retrieve current value,
i
gt;> that the timing of the actual write cannot be observed.
>>
>> Kenn
>>
>> On Tue, Aug 17, 2021 at 10:19 AM Luke Cwik > <mailto:lc...@google.com>> wrote:
>> Yeah, the "or state is committed" means that we should resolve it before any
>> additio
@code read()} is called on the result or state is committed, it forces a
> read of the map and reconciliation with any pending modifications."
>
> My reading of this is that the value changes for anything that happens after
> the call to putIfAbsent. It would be good to make th
t;>> My reading of this is that the value changes for anything that happens
>>>> after the call to putIfAbsent. It would be good to make this clear, that
>>>> any subsequent observation of this cell should observe the new value.
>>>>
>>>> Kenn
>>
t;> On Fri, Aug 13, 2021 at 9:32 AM Reuven Lax wrote:
>>
>>> Yeah - I remember thinking that the computeIfAbsent/putIfAbsent
>>> semantics were very weird. I almost would have preferred not having those
>>> methods in MapState, even though they can be useful.
weird. I almost would have preferred not having those methods in
> MapState, even though they can be useful.
>
> On Fri, Aug 13, 2021 at 9:21 AM Luke Cwik wrote:
>
>> I was surprised to see the MapState API for computeIfAbsent/putIfAbsent
>> only performs the write if the Re
On Wed, Sep 16, 2020 at 8:48 AM Tyson Hamilton wrote:
> The use case is to support an unbounded stream-stream join, where the
> elements are arriving in roughly time sorted order. Removing a specific
> element from the timestamp indexed collection is necessary when a match is
> found.
>
Just che
The use case is to support an unbounded stream-stream join, where the
elements are arriving in roughly time sorted order. Removing a specific
element from the timestamp indexed collection is necessary when a match is
found. Having clearRange is helpful to expire elements that are no longer
relevant
Hi,
Currently we only support removing a timestamp range. You can remove a
single timestamp of course by removing [ts, ts+1), however if there are
multiple elements with the same timestamp this will remove all of those
elements.
Does this fit your use case? If not, I wonder if MapState is closer
Hi Reuven,
I noticed that there was an implementation of the in-memory OrderedListState
introduced [1]. Where can I find out more regarding the plan and design? Is
there a design doc? I'd like to know more details about the implementation to
see if it fits my use case. I was hoping it would hav
Hey folks,
Sry I'm late to this thread but this might be very helpful for the problem
we're dealing with. Do we have a design doc or a jira ticket I can follow?
Cheers,
Catlyn
On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský wrote:
> My questions were just an example. I fully agree there is a fund
My questions were just an example. I fully agree there is a fundamental
need for a sorted state (of some form, and I also think this links to
efficient implementation of retrations) - I was reacting to Kenn's
question about BIP. This one would be pretty nice example why it would
be good to have
Jan - my proposal is exactly TimeSortedBagState (more accurately -
TimeSortedListState), though I went a bit further and also proposed a way
to have a dynamic number of tagged TimeSortedBagStates.
You are correct that the runner doesn't really have to store the data time
sorted - what's actually n
Big +1 for a BIP, as this might really help clarify all the pros and
cons of all possibilities. There seem to be questions that need
answering and motivating use cases - do we need sorted map state or can
we solve our use cases by something simpler - e.g. the mentioned
TimeSortedBagState? Does
Zooming in from generic philosophy to be clear: adding time ordered buffer
to the Fn state API is *not* a shortcut.It has benefits that will not be
achieved by SDK-side implementation on top of either ordered or unordered
multimap. Are those benefits worth expanding the API? I don't know.
A change
It might help for me to describe what I have in mind. I'm still proposing
that we build multimap, just not a globally-sorted multimap.
My previous proposal was that we provide a Multimap state type,
sorted by key. this would have two additional operations -
multimap.getRange(startKey, endKey) and
The portability layer is meant to live across multiple versions of Beam and
I don't think it should be treated by doing the simple and useful thing now
since I believe it will lead to a proliferation of the API.
On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles wrote:
> I have thoughts on the subj
I have thoughts on the subject of whether to have APIs just for the
lowest-level building blocks versus having APIs for higher-level
constructs. Specifically this applies to providing only unsorted multimap
vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
time-ordered buffer;
I would be glad to take a stab at how to provide sorting on top of unsorted
multimap state.
Based upon your description, you want integer keys representing timestamps
and arbitrary user value for the values, is that correct?
What kinds of operations do you need on the sorted map state in order of
e
Bringing this subject up again,
I've spent some time looking into implementing this for the Dataflow
runner. I'm unable to find a way to implement the arbitrary sorted multimap
efficiently for the case where there are large numbers of unique keys.
Since the primary driving use case is timestamp or
This sounds to me like a potential runner strategy. However if a runner can
natively support sorted maps (e.g. we expect the Dataflow runner to be able
to do so, and I think it would be useful for other runners as well), then
it's probably preferable to allow the runner to use its native capabiliti
On Tue, May 28, 2019 at 2:59 AM Robert Bradshaw wrote:
> On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote:
> >
> > On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
> >>
> >> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
> >>>
> >>> Some great comments!
> >>>
> >>> Aljoscha: abso
On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles wrote:
>
> On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
>>
>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
>>>
>>> Some great comments!
>>>
>>> Aljoscha: absolutely this would have to be implemented by runners to be
>>> efficient. W
In my look for, it should have said "void". instead of "key". when
explaining how to do it.
On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik wrote:
> For the API that you proposed, the map key is always "void" and the sort
> key == user key. So in my example of
> key: dummy value
> key.000: token, (
For the API that you proposed, the map key is always "void" and the sort
key == user key. So in my example of
key: dummy value
key.000: token, (0001, value4)
key.001: token, (0010, value1), (0011, value2)
key.01: token
key.1: token, (1011, value3)
you would have:
"void": dummy value
"void".000: tok
Can you explain how fetching and deleting ranges of keys would work with
this data structure?
On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik wrote:
> Reuven, for the example, I assume that we never want to store more then 2
> values at a given sort key prefix, and if we do then we will create a new
On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote:
>
>
> On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
>
>> Some great comments!
>>
>> *Aljoscha*: absolutely this would have to be implemented by runners to
>> be efficient. We can of course provide a default (inefficient)
>> implementatio
On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote:
> Some great comments!
>
> *Aljoscha*: absolutely this would have to be implemented by runners to be
> efficient. We can of course provide a default (inefficient) implementation,
> but ideally runners would provide better ones.
>
> *Jan* Exactly.
ingle key API might be implemented by allEntriesInc(k)
> - allEntriesExcl(k).
>
It's a good question. In the Dataflow runner it would likely be the same
implementation. If that is not true for other runners, then it's worth
keeping both around.
>
> Also, I would assume MapState is impleme
Reuven, for the example, I assume that we never want to store more then 2
values at a given sort key prefix, and if we do then we will create a new
longer prefix splitting up the values based upon the sort key.
Tuple representation in examples below is (key, sort key, value) and . is a
character o
pState is implemented by some runners already. So
will dropping MapState API means runners will change accordingly? Do I
misunderstand?
-Rui
Some great comments!
*Aljoscha*: absolutely this would have to be implemented by runners to be
efficient. We can of course provide a default (inefficient) implementation,
but ideally runners would provide better ones.
*Jan* Exactly. I think MapState can be dropped or backed by this. E.g.
*Robert
On Fri, May 24, 2019 at 5:32 AM Reuven Lax wrote:
>
> On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote:
>>
>>
>>
>> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>>>
>>>
>>>
>>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>
> A few obvious problems with this code:
> 1.
. 5. 2019 10:56:45
Předmět: Re: DISCUSS: Sorted MapState API
"
This is quite interesting! The Flink Table API (relational and SQL) has an
implementation for the type of join you mention in the example. We call it
Temporal Table Join, and it works on something we call Temporal Tables:
This is quite interesting! The Flink Table API (relational and SQL) has an
implementation for the type of join you mention in the example. We call it
Temporal Table Join, and it works on something we call Temporal Tables:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming
On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote:
>
>
> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>
>>
>>
>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>>
>>> A few obvious problems with this code:
1. Removing the elements already processed from the bag requires
clea
On Thu, May 23, 2019 at 11:23 AM Lukasz Cwik wrote:
> I would suggest that we drop MapState and instead support MultimapState
> without ordering as a first pass and potentially add ordering later.
>
While I agree that MultimapState is useful on its own, for these
aggregation use cases ordering i
+100 As someone who has just spent a lot of time coding all the "GC,
caching of BagState fetches, etc" this would make life a lot easier!
Its also generally valuable for a lot of timeseries work.
On Fri, 24 May 2019 at 04:59, Lukasz Cwik wrote:
>
>
> On Thu, May 23, 2019 at 1:53 PM Ahmet Altay
On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote:
>
>
> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>
>>
>>
>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>>
>>> A few obvious problems with this code:
1. Removing the elements already processed from the bag requires
clea
On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote:
>
>
> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
>
>> A few obvious problems with this code:
>>> 1. Removing the elements already processed from the bag requires
>>> clearing and rewriting the entire bag. This is O(n^2) in the number of
On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote:
> A few obvious problems with this code:
>> 1. Removing the elements already processed from the bag requires
>> clearing and rewriting the entire bag. This is O(n^2) in the number of
>> input trades.
>>
> why it's not O(2 * n) to clearing and rew
>
> A few obvious problems with this code:
> 1. Removing the elements already processed from the bag requires
> clearing and rewriting the entire bag. This is O(n^2) in the number of
> input trades.
>
why it's not O(2 * n) to clearing and rewriting trade state?
> public interface SortedMultimap
I would suggest that we drop MapState and instead support MultimapState
without ordering as a first pass and potentially add ordering later.
Inside an SDK we would be able to build a "sorted" multimap state that uses
a prefix of the key and implement radix based sorting. Any time a prefix
has too
Beam's state API is intended to be useful as an alternative aggregation
method. Stateful ParDos can aggregate elements into state and set timers to
read the state. Currently Beam's state API supports two ways of storing
collections of elements: BagState and MapState. BagState is the one most
commo
51 matches
Mail list logo