Thanks Adrian and Jorn for the answers.

Yes, you're right there are lot of things I need to consider if I want to
use Spark for my app.

I still have few concerns/questions from your information:

1/ I need to combine trading stream with tick stream, I am planning to use
Kafka for that
If I am using approach #2 (Direct Approach) in this tutorial
https://spark.apache.org/docs/latest/streaming-kafka-integration.html
Will I receive exactly one semantics? Or I have to add some logic in my
code to archive that.
As your suggestion of using delta update, exactly one semantic is required
for this application.

2/ For ad-hoc query, I must output of Spark to external storage and query
on that right?
Is there any way to do ah-hoc query on Spark? my application could have 50k
updates per second at pick time.
Persistent to external storage lead to high latency in my app.

3/ How to get real-time statistics from Spark,
In  most of the Spark streaming examples, the statistics are echo to the
stdout.
However, I want to display those statics on GUI, is there any way to
retrieve data from Spark directly without using external Storage?


2015-09-19 16:23 GMT+07:00 Jörn Franke <jornfra...@gmail.com>:

> If you want to be able to let your users query their portfolio then you
> may want to think about storing the current state of the portfolios in
> hbase/phoenix or alternatively a cluster of relationaldatabases can make
> sense. For the rest you may use Spark.
>
> Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê <thuyhang...@gmail.com> a
> écrit :
>
>> Hi all,
>>
>> I am going to build a financial application for Portfolio Manager, where
>> each portfolio contains a list of stocks, the number of shares purchased,
>> and the purchase price.
>> Another source of information is stocks price from market data. The
>> application need to calculate real-time gain or lost of each stock in each
>> portfolio ( compared to the purchase price).
>>
>> I am new with Spark, i know using Spark Streaming I can aggregate
>> portfolio possitions in real-time, for example:
>>             user A contains:
>>                       - 100 IBM stock with transactionValue=$15000
>>                       - 500 AAPL stock with transactionValue=$11400
>>
>> Now given the stock prices change in real-time too, e.g if IBM price at
>> 151, i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 -
>> 15000 = $100
>>
>> My questions are:
>>
>>          * What is the best method to combine 2 real-time streams(
>> transaction made by user and market pricing data) in Spark.
>>          * How can I use real-time Adhoc SQL again portfolio's positions,
>> is there any way i can do SQL on the output of Spark Streamming.
>>          For example,
>>               select sum(gainOrLost) from portfolio where user='A';
>>          * What are prefered external storages for Spark in this use case.
>>          * Is spark is right choice for my use case?
>>
>>
>

Reply via email to