bulkLoad has the connection to MongoDB ?
On Fri, Nov 21, 2014 at 4:34 PM, Benny Thompson
wrote:
> I tried using RDD#mapPartitions but my job completes prematurely and
> without error as if nothing gets done. What I have is fairly simple
>
> sc
> .textFile(inputFile)
>
I tried using RDD#mapPartitions but my job completes prematurely and
without error as if nothing gets done. What I have is fairly simple
sc
.textFile(inputFile)
.map(parser.parse)
.mapPartitions(bulkLoad)
But the Iterator[T] of mapPartitions is
On Thu, Nov 20, 2014 at 10:18 PM, Benny Thompson
wrote:
> I'm trying to use MongoDB as a destination for an ETL I'm writing in
> Spark. It appears I'm gaining a lot of overhead in my system databases
> (and possibly in the primary documents themselves); I can only assume it's
> because I'm left
I'm trying to use MongoDB as a destination for an ETL I'm writing in
Spark. It appears I'm gaining a lot of overhead in my system databases
(and possibly in the primary documents themselves); I can only assume it's
because I'm left to using PairRDD.saveAsNewAPIHadoopFile.
- Is there a way to bat