That's great answer Andrian.
I find a lots of information here. I have direction for application now, i
will try your suggestion :)

Vào Thứ Ba, ngày 22 tháng 9 năm 2015, Adrian Tanase <atan...@adobe.com> đã
viết:

>
>    1. reading from kafka has exactly once guarantees - we are using it in
>    production today (with the direct receiver)
>    1. ​you will probably have 2 topics, loading both into spark and
>       joining / unioning as needed is not an issue
>       2. tons of optimizations you can do there, assuming everything else
>       works
>    2. ​for ad-hoc query I would say you absolutely need to look at
>    external storage
>    1. ​querying the Dstream or spark's RDD's directly should be done
>       mostly for aggregates/metrics, not by users
>       2. if you look at HBase or Cassandra for storage then 50k
>       writes /sec are not a problem at all, especially combined with a smart
>       client that does batch puts (like async hbase
>       <https://github.com/OpenTSDB/asynchbase>)
>       3. you could also consider writing the updates to another kafka
>       topic and have  a different component that updates the DB, if you think 
> of
>       other optimisations there
>    3. ​by stats I assume you mean metrics (operational or business)
>    1. ​there are multiple ways to do this, however I would not encourage
>       you to query spark directly, especially if you need an archive/history 
> of
>       your datapoints
>       2. we are using OpenTSDB (we already have a HBase cluster) +
>       Grafana for dashboarding
>       3. collecting the metrics is a bit hairy in a streaming app - we
>       have experimented with both accumulators and RDDs specific for metrics -
>       chose the RDDs that write to OpenTSDB using foreachRdd
>
> ​-adrian
>
> ------------------------------
> *From:* Thúy Hằng Lê <thuyhang...@gmail.com
> <javascript:_e(%7B%7D,'cvml','thuyhang...@gmail.com');>>
> *Sent:* Sunday, September 20, 2015 7:26 AM
> *To:* Jörn Franke
> *Cc:* user@spark.apache.org
> <javascript:_e(%7B%7D,'cvml','user@spark.apache.org');>
> *Subject:* Re: Using Spark for portfolio manager app
>
> Thanks Adrian and Jorn for the answers.
>
> Yes, you're right there are lot of things I need to consider if I want to
> use Spark for my app.
>
> I still have few concerns/questions from your information:
>
> 1/ I need to combine trading stream with tick stream, I am planning to use
> Kafka for that
> If I am using approach #2 (Direct Approach) in this tutorial
> https://spark.apache.org/docs/latest/streaming-kafka-integration.html
> <https://spark.apache.org/docs/latest/streaming-kafka-integration.html>
> Spark Streaming + Kafka Integration Guide - Spark 1.4.1 ...
> Spark Streaming + Kafka Integration Guide. Apache Kafka is
> publish-subscribe messaging rethought as a distributed, partitioned,
> replicated commit log service.
> Read more...
> <https://spark.apache.org/docs/latest/streaming-kafka-integration.html>
>
> Will I receive exactly one semantics? Or I have to add some logic in my
> code to archive that.
> As your suggestion of using delta update, exactly one semantic is required
> for this application.
>
> 2/ For ad-hoc query, I must output of Spark to external storage and query
> on that right?
> Is there any way to do ah-hoc query on Spark? my application could have
> 50k updates per second at pick time.
> Persistent to external storage lead to high latency in my app.
>
> 3/ How to get real-time statistics from Spark,
> In  most of the Spark streaming examples, the statistics are echo to the
> stdout.
> However, I want to display those statics on GUI, is there any way to
> retrieve data from Spark directly without using external Storage?
>
>
> 2015-09-19 16:23 GMT+07:00 Jörn Franke <jornfra...@gmail.com
> <javascript:_e(%7B%7D,'cvml','jornfra...@gmail.com');>>:
>
>> If you want to be able to let your users query their portfolio then you
>> may want to think about storing the current state of the portfolios in
>> hbase/phoenix or alternatively a cluster of relationaldatabases can make
>> sense. For the rest you may use Spark.
>>
>> Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê <thuyhang...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','thuyhang...@gmail.com');>> a écrit :
>>
>>> Hi all,
>>>
>>> I am going to build a financial application for Portfolio Manager, where
>>> each portfolio contains a list of stocks, the number of shares purchased,
>>> and the purchase price.
>>> Another source of information is stocks price from market data. The
>>> application need to calculate real-time gain or lost of each stock in each
>>> portfolio ( compared to the purchase price).
>>>
>>> I am new with Spark, i know using Spark Streaming I can aggregate
>>> portfolio possitions in real-time, for example:
>>>             user A contains:
>>>                       - 100 IBM stock with transactionValue=$15000
>>>                       - 500 AAPL stock with transactionValue=$11400
>>>
>>> Now given the stock prices change in real-time too, e.g if IBM price at
>>> 151, i want to update the gain or lost of it: gainOrLost(IBM) = 151*100 -
>>> 15000 = $100
>>>
>>> My questions are:
>>>
>>>          * What is the best method to combine 2 real-time streams(
>>> transaction made by user and market pricing data) in Spark.
>>>          * How can I use real-time Adhoc SQL again
>>> portfolio's positions, is there any way i can do SQL on the output of Spark
>>> Streamming.
>>>          For example,
>>>               select sum(gainOrLost) from portfolio where user='A';
>>>          * What are prefered external storages for Spark in this use
>>> case.
>>>          * Is spark is right choice for my use case?
>>>
>>>
>>
>

Reply via email to