Re: Parallelizing operations using Spark

2015-11-17 Thread PhuDuc Nguyen
You should try passing your solr writer into rdd.foreachPartition() for max parallelism - each partition on each executor will execute the function passed in. HTH, Duc On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar wrote: > Any input/suggestions on parallelizing below operations using Spark ove

Re: Parallelizing operations using Spark

2015-11-17 Thread Susheel Kumar
Any input/suggestions on parallelizing below operations using Spark over Java Thread pooling - reading of 100 thousands json files from local file system - processing each file content and submitting to Solr as Input document Thanks, Susheel On Mon, Nov 16, 2015 at 5:44 PM, Susheel Kumar wrote:

Parallelizing operations using Spark

2015-11-16 Thread Susheel Kumar
Hello Spark Users, My first email to spark mailing list and looking forward. I have been working on Solr and in the past have used Java thread pooling to parallelize Solr indexing using SolrJ. Now i am again working on indexing data and this time from JSON files (in 100 thousands) and before I tr