Thanks all,

Using external storage seems to be the best solution for now.

Btw, have any one heard about following spark streaming module from Intel?
https://github.com/Intel-bigdata/spark-streamingsql
Seems it allow us to query on Spark stream on the fly, however it haven't
updated for 9 months, so I'm not sure it's still good to use.




2015-09-20 13:17 GMT+07:00 Jörn Franke <jornfra...@gmail.com>:

> I think generally the way forward would be to put aggregate statistics to
> an external storage (eg hbase) - it should not have that much influence on
> latency. You will probably need it anyway if you need to store historical
> information. Wrt to deltas - always a tricky topic. You may want to work
> with absolute values and when the application queries the external
> datastore then it calculates deltas. Once this works you can think if you
> still need to do the delta approach or not.
>
> Le dim. 20 sept. 2015 à 6:26, Thúy Hằng Lê <thuyhang...@gmail.com> a
> écrit :
>
>> Thanks Adrian and Jorn for the answers.
>>
>> Yes, you're right there are lot of things I need to consider if I want to
>> use Spark for my app.
>>
>> I still have few concerns/questions from your information:
>>
>> 1/ I need to combine trading stream with tick stream, I am planning to
>> use Kafka for that
>> If I am using approach #2 (Direct Approach) in this tutorial
>> https://spark.apache.org/docs/latest/streaming-kafka-integration.html
>> Will I receive exactly one semantics? Or I have to add some logic in my
>> code to archive that.
>> As your suggestion of using delta update, exactly one semantic is
>> required for this application.
>>
>> 2/ For ad-hoc query, I must output of Spark to external storage and query
>> on that right?
>> Is there any way to do ah-hoc query on Spark? my application could have
>> 50k updates per second at pick time.
>> Persistent to external storage lead to high latency in my app.
>>
>> 3/ How to get real-time statistics from Spark,
>> In  most of the Spark streaming examples, the statistics are echo to the
>> stdout.
>> However, I want to display those statics on GUI, is there any way to
>> retrieve data from Spark directly without using external Storage?
>>
>>
>> 2015-09-19 16:23 GMT+07:00 Jörn Franke <jornfra...@gmail.com>:
>>
>>> If you want to be able to let your users query their portfolio then you
>>> may want to think about storing the current state of the portfolios in
>>> hbase/phoenix or alternatively a cluster of relationaldatabases can make
>>> sense. For the rest you may use Spark.
>>>
>>> Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê <thuyhang...@gmail.com> a
>>> écrit :
>>>
>>>> Hi all,
>>>>
>>>> I am going to build a financial application for Portfolio Manager,
>>>> where each portfolio contains a list of stocks, the number of shares
>>>> purchased, and the purchase price.
>>>> Another source of information is stocks price from market data. The
>>>> application need to calculate real-time gain or lost of each stock in each
>>>> portfolio ( compared to the purchase price).
>>>>
>>>> I am new with Spark, i know using Spark Streaming I can aggregate
>>>> portfolio possitions in real-time, for example:
>>>>             user A contains:
>>>>                       - 100 IBM stock with transactionValue=$15000
>>>>                       - 500 AAPL stock with transactionValue=$11400
>>>>
>>>> Now given the stock prices change in real-time too, e.g if IBM price at
>>>> 151, i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 -
>>>> 15000 = $100
>>>>
>>>> My questions are:
>>>>
>>>>          * What is the best method to combine 2 real-time streams(
>>>> transaction made by user and market pricing data) in Spark.
>>>>          * How can I use real-time Adhoc SQL again
>>>> portfolio's positions, is there any way i can do SQL on the output of Spark
>>>> Streamming.
>>>>          For example,
>>>>               select sum(gainOrLost) from portfolio where user='A';
>>>>          * What are prefered external storages for Spark in this use
>>>> case.
>>>>          * Is spark is right choice for my use case?
>>>>
>>>>
>>>
>>

Reply via email to