Thanks Adrian and Jorn for the answers. Yes, you're right there are lot of things I need to consider if I want to use Spark for my app.
I still have few concerns/questions from your information: 1/ I need to combine trading stream with tick stream, I am planning to use Kafka for that If I am using approach #2 (Direct Approach) in this tutorial https://spark.apache.org/docs/latest/streaming-kafka-integration.html Will I receive exactly one semantics? Or I have to add some logic in my code to archive that. As your suggestion of using delta update, exactly one semantic is required for this application. 2/ For ad-hoc query, I must output of Spark to external storage and query on that right? Is there any way to do ah-hoc query on Spark? my application could have 50k updates per second at pick time. Persistent to external storage lead to high latency in my app. 3/ How to get real-time statistics from Spark, In most of the Spark streaming examples, the statistics are echo to the stdout. However, I want to display those statics on GUI, is there any way to retrieve data from Spark directly without using external Storage? 2015-09-19 16:23 GMT+07:00 Jörn Franke <jornfra...@gmail.com>: > If you want to be able to let your users query their portfolio then you > may want to think about storing the current state of the portfolios in > hbase/phoenix or alternatively a cluster of relationaldatabases can make > sense. For the rest you may use Spark. > > Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê <thuyhang...@gmail.com> a > écrit : > >> Hi all, >> >> I am going to build a financial application for Portfolio Manager, where >> each portfolio contains a list of stocks, the number of shares purchased, >> and the purchase price. >> Another source of information is stocks price from market data. The >> application need to calculate real-time gain or lost of each stock in each >> portfolio ( compared to the purchase price). >> >> I am new with Spark, i know using Spark Streaming I can aggregate >> portfolio possitions in real-time, for example: >> user A contains: >> - 100 IBM stock with transactionValue=$15000 >> - 500 AAPL stock with transactionValue=$11400 >> >> Now given the stock prices change in real-time too, e.g if IBM price at >> 151, i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 - >> 15000 = $100 >> >> My questions are: >> >> * What is the best method to combine 2 real-time streams( >> transaction made by user and market pricing data) in Spark. >> * How can I use real-time Adhoc SQL again portfolio's positions, >> is there any way i can do SQL on the output of Spark Streamming. >> For example, >> select sum(gainOrLost) from portfolio where user='A'; >> * What are prefered external storages for Spark in this use case. >> * Is spark is right choice for my use case? >> >> >