Cool use case! You should definitely be able to model it with Spark.

For the first question it's pretty easy - you probably need to keep the user 
portfolios as state using updateStateByKey.
You need to consume 2 event sources - user trades and stock changes. You 
probably want to Cogroup the stock changes with users that have that stock in 
their portfolio, then union the 2 message streams.

As messages come in, you consume the union of these 2 streams and you update 
the state. Messages modeled as case classes and a pattern match should do the 
trick (assuming scala). After the update, you need to emit a tuple with 
(newPortfolio, gainOrLost) so you can also push the deltas somewhere.

For the Sql part, you need to create a Dataframe out of the user portfolio 
DStream, using foreachrdd. Look around for examples of Sql + spark streaming, I 
think databricks had a sample app / tutorial. 
You can then query the resulting DataFrame using SQL.
If instead of one update you want to provide a graph then you need to use a 
window over the gainOrLose.

That being said, there are a lot of interesting questions you'll need to answer 
about state keeping, event sourcing, persistance, durability - especially 
around outputting data out of spark, where you need to do more work to achieve 
exactly once semmantics. I only focused on the main dataflow.

Hope this helps, that's how I'd model it, anyway :)

-adrian

Sent from my iPhone

> On 19 Sep 2015, at 05:43, Thúy Hằng Lê <thuyhang...@gmail.com> wrote:
> 
> Hi all,
> 
> I am going to build a financial application for Portfolio Manager, where each 
> portfolio contains a list of stocks, the number of shares purchased, and the 
> purchase price. 
> Another source of information is stocks price from market data. The 
> application need to calculate real-time gain or lost of each stock in each 
> portfolio ( compared to the purchase price).
> 
> I am new with Spark, i know using Spark Streaming I can aggregate portfolio 
> possitions in real-time, for example:
>             user A contains: 
>                       - 100 IBM stock with transactionValue=$15000
>                       - 500 AAPL stock with transactionValue=$11400
> 
> Now given the stock prices change in real-time too, e.g if IBM price at 151, 
> i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 - 15000 = 
> $100
> 
> My questions are:
> 
>          * What is the best method to combine 2 real-time streams( 
> transaction made by user and market pricing data) in Spark.
>          * How can I use real-time Adhoc SQL again portfolio's positions, is 
> there any way i can do SQL on the output of Spark Streamming.
>          For example,
>               select sum(gainOrLost) from portfolio where user='A';
>          * What are prefered external storages for Spark in this use case.
>          * Is spark is right choice for my use case?
>          

Reply via email to