Re: How can I efficiently export the content of my table to KAFKA

2017-04-26 Thread Justin Cameron
You can run multiple applications in parallel in Standalone mode - you just need to configure spark to allocate resources between your jobs the way you want (by default it assigns all resources to the first application you run, so they won't be freed up until it has finished). You can use Spark's

Re: How can I efficiently export the content of my table to KAFKA

2017-04-26 Thread Tobias Eriksson
Well, I have been working some with Spark and the biggest hurdle is that Spark does not allow me to run multiple jobs in parallel i.e. at the point of starting the job to taking the table of “Individuals” I will have to wait until all that processing is done before I can start an additional one

Re: How can I efficiently export the content of my table to KAFKA

2017-04-26 Thread Justin Cameron
You could probably save yourself a lot of hassle by just writing a Spark job that scans through the entire table, converts each row to JSON and dumps the output into a Kafka topic. It should be fairly straightforward to implement. Spark will manage the partitioning of "Producer" processes for you

How can I efficiently export the content of my table to KAFKA

2017-04-26 Thread Tobias Eriksson
Hi I would like to make a dump of the database, in JSON format, to KAFKA The database contains lots of data, millions and in some cases billions of “rows” I will provide the customer with an export of the data, where they can read it off of a KAFKA topic My thinking was to have it scalable such

Last chance: ApacheCon is just three weeks away

2017-04-26 Thread Rich Bowen
ApacheCon is just three weeks away, in Miami, Florida, May 15th - 18th. http://apachecon.com/ There's still time to register and attend. ApacheCon is the best place to find out about tomorrow's software, today. ApacheCon is the official convention of The Apache Software Foundation, and includes t

Re: cassandra OOM

2017-04-26 Thread Jean Carlo
Hello @Durity Would you mind to share information about your cluster? Actually I am interested to know which version of cassandra you use. And how much time do the gc pauses spend. Thank you very much Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Tue, A