Cool use case! You should definitely be able to model it with Spark. For the first question it's pretty easy - you probably need to keep the user portfolios as state using updateStateByKey. You need to consume 2 event sources - user trades and stock changes. You probably want to Cogroup the stock changes with users that have that stock in their portfolio, then union the 2 message streams.
As messages come in, you consume the union of these 2 streams and you update the state. Messages modeled as case classes and a pattern match should do the trick (assuming scala). After the update, you need to emit a tuple with (newPortfolio, gainOrLost) so you can also push the deltas somewhere. For the Sql part, you need to create a Dataframe out of the user portfolio DStream, using foreachrdd. Look around for examples of Sql + spark streaming, I think databricks had a sample app / tutorial. You can then query the resulting DataFrame using SQL. If instead of one update you want to provide a graph then you need to use a window over the gainOrLose. That being said, there are a lot of interesting questions you'll need to answer about state keeping, event sourcing, persistance, durability - especially around outputting data out of spark, where you need to do more work to achieve exactly once semmantics. I only focused on the main dataflow. Hope this helps, that's how I'd model it, anyway :) -adrian Sent from my iPhone > On 19 Sep 2015, at 05:43, Thúy Hằng Lê <thuyhang...@gmail.com> wrote: > > Hi all, > > I am going to build a financial application for Portfolio Manager, where each > portfolio contains a list of stocks, the number of shares purchased, and the > purchase price. > Another source of information is stocks price from market data. The > application need to calculate real-time gain or lost of each stock in each > portfolio ( compared to the purchase price). > > I am new with Spark, i know using Spark Streaming I can aggregate portfolio > possitions in real-time, for example: > user A contains: > - 100 IBM stock with transactionValue=$15000 > - 500 AAPL stock with transactionValue=$11400 > > Now given the stock prices change in real-time too, e.g if IBM price at 151, > i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 - 15000 = > $100 > > My questions are: > > * What is the best method to combine 2 real-time streams( > transaction made by user and market pricing data) in Spark. > * How can I use real-time Adhoc SQL again portfolio's positions, is > there any way i can do SQL on the output of Spark Streamming. > For example, > select sum(gainOrLost) from portfolio where user='A'; > * What are prefered external storages for Spark in this use case. > * Is spark is right choice for my use case? >