You can run multiple applications in parallel in Standalone mode - you just
need to configure spark to allocate resources between your jobs the way you
want (by default it assigns all resources to the first application you run,
so they won't be freed up until it has finished).
You can use Spark's
Well, I have been working some with Spark and the biggest hurdle is that Spark
does not allow me to run multiple jobs in parallel
i.e. at the point of starting the job to taking the table of “Individuals” I
will have to wait until all that processing is done before I can start an
additional one
You could probably save yourself a lot of hassle by just writing a Spark
job that scans through the entire table, converts each row to JSON and
dumps the output into a Kafka topic. It should be fairly straightforward to
implement.
Spark will manage the partitioning of "Producer" processes for you
Hi
I would like to make a dump of the database, in JSON format, to KAFKA
The database contains lots of data, millions and in some cases billions of
“rows”
I will provide the customer with an export of the data, where they can read it
off of a KAFKA topic
My thinking was to have it scalable such
ApacheCon is just three weeks away, in Miami, Florida, May 15th - 18th.
http://apachecon.com/
There's still time to register and attend. ApacheCon is the best place
to find out about tomorrow's software, today.
ApacheCon is the official convention of The Apache Software Foundation,
and includes t
Hello @Durity
Would you mind to share information about your cluster? Actually I am
interested to know which version of cassandra you use. And how much time do
the gc pauses spend.
Thank you very much
Saludos
Jean Carlo
"The best way to predict the future is to invent it" Alan Kay
On Tue, A