You should try passing your solr writer into rdd.foreachPartition() for max
parallelism - each partition on each executor will execute the function
passed in.
HTH,
Duc
On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar
wrote:
> Any input/suggestions on parallelizing below operations using Spark ove
Any input/suggestions on parallelizing below operations using Spark over
Java Thread pooling
- reading of 100 thousands json files from local file system
- processing each file content and submitting to Solr as Input document
Thanks,
Susheel
On Mon, Nov 16, 2015 at 5:44 PM, Susheel Kumar
wrote:
Hello Spark Users,
My first email to spark mailing list and looking forward. I have been
working on Solr and in the past have used Java thread pooling to
parallelize Solr indexing using SolrJ.
Now i am again working on indexing data and this time from JSON files (in
100 thousands) and before I tr