Re: MongoDB Bulk Inserts

2014-11-21 Thread Soumya Simanta
bulkLoad has the connection to MongoDB ? On Fri, Nov 21, 2014 at 4:34 PM, Benny Thompson wrote: > I tried using RDD#mapPartitions but my job completes prematurely and > without error as if nothing gets done. What I have is fairly simple > > sc > .textFile(inputFile) >

Re: MongoDB Bulk Inserts

2014-11-21 Thread Benny Thompson
I tried using RDD#mapPartitions but my job completes prematurely and without error as if nothing gets done. What I have is fairly simple sc .textFile(inputFile) .map(parser.parse) .mapPartitions(bulkLoad) But the Iterator[T] of mapPartitions is

Re: MongoDB Bulk Inserts

2014-11-20 Thread Soumya Simanta
On Thu, Nov 20, 2014 at 10:18 PM, Benny Thompson wrote: > I'm trying to use MongoDB as a destination for an ETL I'm writing in > Spark. It appears I'm gaining a lot of overhead in my system databases > (and possibly in the primary documents themselves); I can only assume it's > because I'm left

MongoDB Bulk Inserts

2014-11-20 Thread Benny Thompson
I'm trying to use MongoDB as a destination for an ETL I'm writing in Spark. It appears I'm gaining a lot of overhead in my system databases (and possibly in the primary documents themselves); I can only assume it's because I'm left to using PairRDD.saveAsNewAPIHadoopFile. - Is there a way to bat