On 7 Jul 2017, at 08:37, Esa Heikkinen
mailto:esa.heikki...@student.tut.fi>> wrote:
I only want to simulate very huge "network" with even millions parallel time
syncronized actors (state machines). There are also communication between
actors via some (key-value pairs) database. I also want th
4
Vastaanottaja: Esa Heikkinen
Kopio: Mahesh Sawaiker; user@spark.apache.org
Aihe: Re: VS: Using Spark as a simulator
Spark dropped Akka some time ago...
I think the main issue he will face is a library for simulating the state
machines (randomly), storing a huge amount of files (HDFS is probably
y: 21. kesäkuuta 2017 14:45
> Vastaanottaja: Esa Heikkinen; Jörn Franke
> Kopio: user@spark.apache.org
> Aihe: RE: Using Spark as a simulator
>
> Spark can help you to create one large file if needed, but hdfs itself will
> provide abstraction over such things, so it’s a trivia
The spark was originally built on it (Akka).
Esa
Lähettäjä: Mahesh Sawaiker
Lähetetty: 21. kesäkuuta 2017 14:45
Vastaanottaja: Esa Heikkinen; Jörn Franke
Kopio: user@spark.apache.org
Aihe: RE: Using Spark as a simulator
Spark can help you to create one large file
object. This way you will get a RDD
of scala objects, which you can then process functional/set operators.
You would want to read about PairRDDs.
From: Esa Heikkinen [mailto:esa.heikki...@student.tut.fi]
Sent: Wednesday, June 21, 2017 1:12 PM
To: Jörn Franke
Cc: user@spark.apache.org
Subjec
Vastaanottaja: Esa Heikkinen
Kopio: user@spark.apache.org
Aihe: Re: Using Spark as a simulator
It is fine, but you have to design it that generated rows are written in large
blocks for optimal performance.
The most tricky part with data generation is the conceptual part, such as
probabilistic
nt in the tables from 1G upwards.
From: Esa Heikkinen [mailto:esa.heikki...@student.tut.fi]
Sent: Tuesday, June 20, 2017 7:34 PM
To: user@spark.apache.org
Subject: Using Spark as a simulator
Hi
Spark is a data analyzer, but would it be possible to use Spark as a data
generator or simulator ?
It is fine, but you have to design it that generated rows are written in large
blocks for optimal performance.
The most tricky part with data generation is the conceptual part, such as
probabilistic distribution etc
You have to check as well that you use a good random generator, for some cases
Hi
Spark is a data analyzer, but would it be possible to use Spark as a data
generator or simulator ?
My simulation can be very huge and i think a parallelized simulation using by
Spark (cloud) could work.
Is that good or bad idea ?
Regards
Esa Heikkinen