Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Ben Chambers Mon, 12 Jun 2023 13:00:33 -0700

Hey Daniel -- Great question!

Kaskada was designed to be similar to SQL but with a few differences.
The most significant is the assumption of both ordering and grouping.
Kaskada uses this to automatically merge multiple input collections,
and to allow data-dependent windows that identify a range of time. For
instance, the query `Purchases.amount | sum(window = since(Login))` to
sum the amount spent since the last login. In user studies, we've
heard that these make it much easier to compose queries analyzing the
entire "journey" or "funnel" for each user.


There are also cases where the ordering assumption *isn't* a good fit
-- queries that aren't as sensitive to time. Having both options
readily available would allow a user to choose what is most natural to
them and their use case.

-- Ben

On Mon, Jun 12, 2023 at 12:14 PM Daniel Collins <[email protected]> wrote:
>
> How does this mechanism differ from beam SQL which already offers windowing 
> via SQL over PCollections?
>
> https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/
>
> -Daniel
>
> On Mon, Jun 12, 2023 at 3:11 PM Ryan Michael <[email protected]> wrote:
>>
>> Hello, Beam (also)!
>>
>> Just introducing myself - I'm Ryan and I've been working with Ben on the 
>> Kaskada project for the past few years. As Ben mentioned, I think there's a 
>> great opportunity to bring together some of the work we've done to make 
>> time-based computation easier to reason about with the Beam community's work 
>> on scalable streaming computation.
>>
>> I'll be at the Beam Summit in NYC starting Wednesday and presenting a short 
>> overview of how we see Kaskada fitting into the Generative AI world at the 
>> "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or 
>> GenAI) is interesting to you and you'll be at the conference I'd love to 
>> touch base in person!
>>
>> -Ryan
>>
>> On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers <[email protected]> wrote:
>>>
>>> Hello Beam!
>>>
>>> Kaskada has created a query language for expressing temporal queries,
>>> making it easy to work with multiple streams and perform temporally
>>> correct joins. We’re looking at taking our native, columnar execution
>>> engine and making it available as a PTransform and FnHarness for use
>>> with Apache Beam.
>>>
>>> We’ve drafted a [short document][proposal] outlining our planned
>>> approach and the potential benefits to Kaskada and Beam users. It
>>> would be super helpful to get some feedback on this approach and any
>>> ways that it could be improved / better integrated with Beam to
>>> provide more value!
>>>
>>> Could you see yourself using (or contributing) to this work? Let us know!
>>>
>>> Thanks!
>>>
>>> Ben
>>>
>>> [proposal]: 
>>> https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit
>>
>>
>>
>> --
>> Ryan Michael
>> [email protected] | 512.466.3662 | github | linkedin

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Reply via email to