Can you try with these key value classes and see the performance? inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
keyClassName = "org.apache.hadoop.io.Text" valueClassName = "org.apache.hadoop.io.MapWritable" Taken from databricks blog <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html> Thanks Best Regards On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari < deepesh.maheshwar...@gmail.com> wrote: > Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. > > /**** Code *****/ > > config.set("mongo.job.input.format", > "com.mongodb.hadoop.MongoInputFormat"); > config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); > config.set("mongo.input.query","{host: 'abc.com'}"); > > JavaSparkContext sc=new JavaSparkContext("local", "MongoOps"); > > JavaPairRDD<Object, BSONObject> mongoRDD = > sc.newAPIHadoopRDD(config, > com.mongodb.hadoop.MongoInputFormat.class, Object.class, > BSONObject.class); > > long count=mongoRDD.count(); > > There are about 1.5million record. > Though i am getting data but read operation took around 15min to read > whole. > > Is this Api really too slow or am i missing something. > Please suggest if there is an alternate approach to read data from Mongo > faster. > > Thanks, > Deepesh >