Hi Akhil, This code snippet is from below link https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
Here it reading data from HDFS file system but in our case i need to read from mongodb. I have tried it earlier and now again tried it but is giving below error which is self explanantory. Exception in thread "main" java.io.IOException: No FileSystem for scheme: mongodb On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Here's a piece of code which works well for us (spark 1.4.1) > > Configuration bsonDataConfig = new Configuration(); > bsonDataConfig.set("mongo.job.input.format", > "com.mongodb.hadoop.BSONFileInputFormat"); > > Configuration predictionsConfig = new Configuration(); > predictionsConfig.set("mongo.output.uri", mongodbUri); > > JavaPairRDD<Object,BSONObject> bsonRatingsData = > sc.newAPIHadoopFile( > ratingsUri, BSONFileInputFormat.class, Object.class, > BSONObject.class, bsonDataConfig); > > > Thanks > Best Regards > > On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari < > deepesh.maheshwar...@gmail.com> wrote: > >> Hi, I am using <spark.version>1.3.0</spark.version> >> >> I am not getting constructor for above values >> >> [image: Inline image 1] >> >> So, i tried to shuffle the values in constructor . >> [image: Inline image 2] >> >> But, it is giving this error.Please suggest >> [image: Inline image 3] >> >> Best Regards >> >> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Can you try with these key value classes and see the performance? >>> >>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat" >>> >>> >>> keyClassName = "org.apache.hadoop.io.Text" >>> valueClassName = "org.apache.hadoop.io.MapWritable" >>> >>> >>> Taken from databricks blog >>> <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html> >>> >>> Thanks >>> Best Regards >>> >>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari < >>> deepesh.maheshwar...@gmail.com> wrote: >>> >>>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. >>>> >>>> /**** Code *****/ >>>> >>>> config.set("mongo.job.input.format", >>>> "com.mongodb.hadoop.MongoInputFormat"); >>>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); >>>> config.set("mongo.input.query","{host: 'abc.com'}"); >>>> >>>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps"); >>>> >>>> JavaPairRDD<Object, BSONObject> mongoRDD = >>>> sc.newAPIHadoopRDD(config, >>>> com.mongodb.hadoop.MongoInputFormat.class, Object.class, >>>> BSONObject.class); >>>> >>>> long count=mongoRDD.count(); >>>> >>>> There are about 1.5million record. >>>> Though i am getting data but read operation took around 15min to read >>>> whole. >>>> >>>> Is this Api really too slow or am i missing something. >>>> Please suggest if there is an alternate approach to read data from >>>> Mongo faster. >>>> >>>> Thanks, >>>> Deepesh >>>> >>> >>> >> >