Hi!
There are any way to use spark with spark streaming together to create real
time architecture?
How can I merge the spark and spark streaming result at realtime (and drop
streaming result if spark result generated)?
Thanks
--
View this message in context:
I think that you have two options:
- to run your code locally, you can use local mode by using the 'local'
master like so:
new SparkConf().setMaster(local[4]) where 4 is the number of cores
assigned to the local mode.
- to run your code remotely you need to build the jar with dependencies and
Hi,
The scheduling related code can be found at:
https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/scheduler
The DAG (Directed Acyclic Graph) scheduler is a good start point:
Aaron, Thank You for your response and clarifying things.
-Vibhor
On Sun, Jun 1, 2014 at 11:40 AM, Aaron Davidson ilike...@gmail.com wrote:
There is no fundamental issue if you're running on data that is larger
than cluster memory size. Many operations can stream data through, and thus
I would make sure that your workers are running. It is very difficult to tell
from the console dribble if you just have no data or the workers just
disassociated from masters.
Gino B.
On Jun 6, 2014, at 11:32 PM, Jeremy Lee unorthodox.engine...@gmail.com
wrote:
Yup, when it's running,
So you can run a job / spark job to get data to disk/hdfs. Then run a
dstream from a hdfs folder. As you move your files, the dstream will kick
in.
Regards
Mayur
On 6 Jun 2014 21:13, Gianluca Privitera
gianluca.privite...@studio.unibo.it wrote:
Where are the API for QueueStream and RddQueue?
Hi ,
I am interested in deploying spark 1.0.0 on ec2 and wanted to know
which all regions are supported.I have been able to deploy the
previous version in east but i had a hard time launching the cluster
due to bad connection the script provided would fail to ssh into a
node after a couple of
Thanks all - I still don't know what the underlying problem is, but I KIND
OF got it working by dumping my random-words stuff to a file and pointing
spark streaming to that. So it's not Streaming, as such, but I got
output.
More investigation to follow =)
On Sat, Jun 7, 2014 at 8:22 AM, Gino
Hi All,
As we know, In MLlib the SVM is used for binary classification. I wonder
how to train SVM model for mutiple classification in MLlib. In addition, how
to apply the machine learning algorithm in Spark if the algorithm isn't
included in MLlib. Thank you.
--
View this message in context:
Ah looking at that inputformat it should just work out the box using
sc.newAPIHadoopFile ...
Would be interested to hear if it works as expected for you (in python you'll
end up with bytearray values).
N
—
Sent from Mailbox
On Fri, Jun 6, 2014 at 9:38 PM, Jeremy Freeman
QueueStream example is in Spark Streaming examples:
http://www.boyunjian.com/javasrc/org.spark-project/spark-examples_2.9.3/0.7.2/_/spark/streaming/examples/QueueStream.scala
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
Increasing number of partitions on data file solved the problem.
On 6 June 2014 18:46, Oleg Proudnikov oleg.proudni...@gmail.com wrote:
Additional observation - the map and mapValues are pipelined and executed
- as expected - in pairs. This means that there is a simple sequence of
steps -
Not a stupid question! I would like to be able to do this. For now, you
might try writing the data to tachyon http://tachyon-project.org/ instead
of HDFS. This is untested though, please report any issues you run into.
Michael
On Fri, Jun 6, 2014 at 8:13 PM, Xu (Simon) Chen xche...@gmail.com
I was also thinking of using tachyon to store parquet files - maybe
tomorrow I will give a try as well.
2014-06-07 20:01 GMT+02:00 Michael Armbrust mich...@databricks.com:
Not a stupid question! I would like to be able to do this. For now, you
might try writing the data to tachyon
For debugging, I run locally inside Eclipse without maven.
I just add the Spark assembly jar to my Eclipse project build path and click
'Run As... Scala Application'.
I have done the same with Java and Scala Test, it's quick and easy.
I didn't see any third party jar dependencies in your code, so
Is there a way to start tachyon on top of a yarn cluster?
On Jun 7, 2014 2:11 PM, Marek Wiewiorka marek.wiewio...@gmail.com
wrote:
I was also thinking of using tachyon to store parquet files - maybe
tomorrow I will give a try as well.
2014-06-07 20:01 GMT+02:00 Michael Armbrust
Hi All,
I am running spark applications in yarn-cluster mode and need to read the spark
application metrics even after the application is over. I was planning to use
the csv sink, but it seems that codehale's CsvReporter only supports dumping
metrics to local filesystem.
Any suggestions to
Hi Aslan,
You can check out the unittest code of GradientDescent.runMiniBatchSGD
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala
Sincerely,
DB Tsai
---
My Blog:
At this time, you need to do one-vs-all manually for multiclass
training. For your second question, if the algorithm is implemented in
Java/Scala/Python and designed for single machine, you can broadcast
the dataset to each worker, train models on workers. If the algorithm
is implemented in a
19 matches
Mail list logo