Spark n00b here.

Working with online retailers, I start with a list of their products in
Cassandra (with prices, stock levels, descriptions, etc) and then receive an
HTTP request every time one of them changes. For each change, I update the
product in Cassandra and store the change with the old and new values.

What I'd like to do is provide a dashboard with various metrics. Some of
them are trivial, such as "last n changes". Others, like number of
in-stock/out-of-stock products would be more complex to retrieve from
Cassandra, because they're an aggregate of the whole product set. 

I'm thinking about streaming the changes into Spark (via RabbitMQ) to
generate the data needed for the aggregate metrics, and either storing the
results in Cassandra or publishing them back to RabbitMQ (depending on
whether I have the dashboard poll or use a WebSocket).

I have a few questions:

1) Does this seem like a good use case for Spark?

2) How much work is it appropriate for a transformation to do? For example,
my API service currently checks the update against the current data and only
publishes a change if they differ. That sounds to me like it could be a
filter operation on a stream of all the updates, but it would require
accessing data from Cassandra inside the filter transformation. Is that
okay, or something to be avoided? The changes that make it through the
filter would also have to be logged in Cassandra. Is that crossing concerns
too much?

3) If I'm starting out with existing data, how do I take that into account
when starting to do stream processing? Would I write something to take my
logged changes from Cassandra and publish them to RabbitMQ before I start my
real streaming? Seems like the switch-over might be tricky. (Note: I don't
necessarily need to do this, depending on how things go.)

4) Is it a good idea to start with 2.0 now? I see there's an AMQP module
with 2.0 support and the Cassandra one supports 2.0 with a little work.

Thanks for any feedback.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-right-for-my-use-case-tp27491.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to