I think generally the way forward would be to put aggregate statistics to
an external storage (eg hbase) - it should not have that much influence on
latency. You will probably need it anyway if you need to store historical
information. Wrt to deltas - always a tricky topic. You may want to work
with absolute values and when the application queries the external
datastore then it calculates deltas. Once this works you can think if you
still need to do the delta approach or not.

Le dim. 20 sept. 2015 à 6:26, Thúy Hằng Lê <thuyhang...@gmail.com> a écrit :

> Thanks Adrian and Jorn for the answers.
>
> Yes, you're right there are lot of things I need to consider if I want to
> use Spark for my app.
>
> I still have few concerns/questions from your information:
>
> 1/ I need to combine trading stream with tick stream, I am planning to use
> Kafka for that
> If I am using approach #2 (Direct Approach) in this tutorial
> https://spark.apache.org/docs/latest/streaming-kafka-integration.html
> Will I receive exactly one semantics? Or I have to add some logic in my
> code to archive that.
> As your suggestion of using delta update, exactly one semantic is required
> for this application.
>
> 2/ For ad-hoc query, I must output of Spark to external storage and query
> on that right?
> Is there any way to do ah-hoc query on Spark? my application could have
> 50k updates per second at pick time.
> Persistent to external storage lead to high latency in my app.
>
> 3/ How to get real-time statistics from Spark,
> In  most of the Spark streaming examples, the statistics are echo to the
> stdout.
> However, I want to display those statics on GUI, is there any way to
> retrieve data from Spark directly without using external Storage?
>
>
> 2015-09-19 16:23 GMT+07:00 Jörn Franke <jornfra...@gmail.com>:
>
>> If you want to be able to let your users query their portfolio then you
>> may want to think about storing the current state of the portfolios in
>> hbase/phoenix or alternatively a cluster of relationaldatabases can make
>> sense. For the rest you may use Spark.
>>
>> Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê <thuyhang...@gmail.com> a
>> écrit :
>>
>>> Hi all,
>>>
>>> I am going to build a financial application for Portfolio Manager, where
>>> each portfolio contains a list of stocks, the number of shares purchased,
>>> and the purchase price.
>>> Another source of information is stocks price from market data. The
>>> application need to calculate real-time gain or lost of each stock in each
>>> portfolio ( compared to the purchase price).
>>>
>>> I am new with Spark, i know using Spark Streaming I can aggregate
>>> portfolio possitions in real-time, for example:
>>>             user A contains:
>>>                       - 100 IBM stock with transactionValue=$15000
>>>                       - 500 AAPL stock with transactionValue=$11400
>>>
>>> Now given the stock prices change in real-time too, e.g if IBM price at
>>> 151, i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 -
>>> 15000 = $100
>>>
>>> My questions are:
>>>
>>>          * What is the best method to combine 2 real-time streams(
>>> transaction made by user and market pricing data) in Spark.
>>>          * How can I use real-time Adhoc SQL again
>>> portfolio's positions, is there any way i can do SQL on the output of Spark
>>> Streamming.
>>>          For example,
>>>               select sum(gainOrLost) from portfolio where user='A';
>>>          * What are prefered external storages for Spark in this use
>>> case.
>>>          * Is spark is right choice for my use case?
>>>
>>>
>>
>

Reply via email to