I am very new to Spark.
I am work on a project that involves reading stock transactions off a number
of TCP connections and
1. periodically (once every few hours) uploads the transaction records to
HBase
2. maintains the records that are not yet written into HBase and acts as a
HTTP query server for these records. An example for a query would be to
return all transactions between 1-2pm for Google stocks for the current
trading day.

I am thinking of using Kafka to receive all the transaction records. Spark
will be the consumers of Kafka output.

In particular, I need to create a RDD hashmap with string (stock ticker
symbol) as key and list (or vector) of transaction records as data.
This RDD need to be "thread (or process) safe" since different threads and
processes will be reading and modifying it. I need insertion, deletion, and
lookup to be fast.
Is this something that can be done with Spark and is Spark the right tool to
use in terms of latency and throughput?

Pardon me if I don't know what I am talking about. All these are very new to
me.
Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-the-right-tool-tp16775.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to