Re: SnappyData and Structured Streaming

Jags Ramnarayan Wed, 06 Jul 2016 08:52:54 -0700

The plan is to fully integrate with the new structured streaming API and
implementation in an upcoming release. But, we will continue offering
several extensions. Few noted below ...


- the store (streaming sink) will offer a lot more capabilities like
transactions, replicated tables, partitioned row and column oriented tables
to suit different types of workloads.
- While streaming API(scala) in snappydata itself will change a bit to
become fully compatible with structured streaming(SchemaDStream will go
away), we will continue to offer SQL support for streams so they can be
managed from external clients (JDBC, ODBC), their partitions can share the
same partitioning strategy as the underlying table where it might be
stored, and even registrations of continuous queries from remote clients.

While building streaming apps using the Spark APi offers tremendous
flexibility we also want to make it simple for apps to work with streams
just using SQL. For instance, you should be able to declaratively specify a
table as a sink to a stream(i.e. using SQL). For example, you can specify a
"TopK Table" (a built in special table for topK analytics using
probabilistic data structures) as a sink for a high velocity time series
stream like this - "create topK table MostPopularTweets on tweetStreamTable "
+
"options(key 'hashtag', frequencyCol 'retweets', timeSeriesColumn
'tweetTime' )"
where 'tweetStreamTable' is created using the 'create stream table ...' SQL
syntax.


-----
Jags
SnappyData blog <http://www.snappydata.io/blog>
Download binary, source <https://github.com/SnappyDataInc/snappydata>


On Wed, Jul 6, 2016 at 8:02 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Jags,
>
> I should have been more specific. I am referring to what I read at
> http://snappydatainc.github.io/snappydata/streamingWithSQL/, especially
> the Streaming Tables part. It roughly coincides with the Streaming
> DataFrames outlined here
> https://docs.google.com/document/d/1NHKdRSNCbCmJbinLmZuqNA1Pt6CGpFnLVRbzuDUcZVM/edit#heading=h.ff0opfdo6q1h.
> I don’t if I’m wrong, but they both sound very similar. That’s why I posed
> this question.
>
> Thanks,
> Ben
>
> On Jul 6, 2016, at 7:03 AM, Jags Ramnarayan <jramnara...@snappydata.io>
> wrote:
>
> Ben,
>    Note that Snappydata's primary objective is to be a distributed
> in-memory DB for mixed workloads (i.e. streaming with transactions and
> analytic queries). On the other hand, Spark, till date, is primarily
> designed as a processing engine over myriad storage engines (SnappyData
> being one). So, the marriage is quite complementary. The difference
> compared to other stores is that SnappyData realizes its solution by deeply
> integrating and collocating with Spark (i.e. share spark executor
> memory/resources with the store) avoiding serializations and shuffle in
> many situations.
>
> On your specific thought about being similar to Structured streaming, a
> better discussion could be a comparison to the recently introduced State
> store
> <https://docs.google.com/document/d/1-ncawFx8JS5Zyfq1HAEGBx56RDet9wfVp_hDM8ZL254/edit#heading=h.2h7zw4ru3nw7>
>  (perhaps
> this is what you meant).
> It proposes a KV store for streaming aggregations with support for
> updates. The proposed API will, at some point, be pluggable so vendors can
> easily support alternate implementations to storage, not just HDFS(default
> store in proposed State store).
>
>
> -----
> Jags
> SnappyData blog <http://www.snappydata.io/blog>
> Download binary, source <https://github.com/SnappyDataInc/snappydata>
>
>
> On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> I recently got a sales email from SnappyData, and after reading the
>> documentation about what they offer, it sounds very similar to what
>> Structured Streaming will offer w/o the underlying in-memory,
>> spill-to-disk, CRUD compliant data storage in SnappyData. I was wondering
>> if Structured Streaming is trying to achieve the same on its own or is
>> SnappyData contributing Streaming extensions that they built to the Spark
>> project. Lastly, what does the Spark community think of this so-called
>> “Spark Data Store”?
>>
>> Thanks,
>> Ben
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>

Re: SnappyData and Structured Streaming

Reply via email to