Re: Slow Mongo Read from Spark

2015-09-03 Thread Deepesh Maheshwari
Because of existing architecture , i am bound to use mongodb. Please suggest for this On Thu, Sep 3, 2015 at 9:10 PM, Jörn Franke wrote: > You might think about another storage layer not being mongodb > (hdfs+orc+compression or hdfs+parquet+compression) to improve performance > > Le jeu. 3 sep

Re: Slow Mongo Read from Spark

2015-09-03 Thread Jörn Franke
You might think about another storage layer not being mongodb (hdfs+orc+compression or hdfs+parquet+compression) to improve performance Le jeu. 3 sept. 2015 à 9:15, Akhil Das a écrit : > On SSD you will get around 30-40MB/s on a single machine (on 4 cores). > > Thanks > Best Regards > > On Mon,

Re: Slow Mongo Read from Spark

2015-09-03 Thread Akhil Das
On SSD you will get around 30-40MB/s on a single machine (on 4 cores). Thanks Best Regards On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari < deepesh.maheshwar...@gmail.com> wrote: > tried it,,gives the same above exception > > Exception in thread "main" java.io.IOException: No FileSystem for

Re: Slow Mongo Read from Spark

2015-08-31 Thread Akhil Das
FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class itself underneath and it doesnt mean it will only read from HDFS. Give it a shot if you haven't tried it already (it just the inputformat and the reader which are different from your approach). Thanks Best Regards On Mon, Aug 3

Re: Slow Mongo Read from Spark

2015-08-31 Thread Deepesh Maheshwari
Hi Akhil, This code snippet is from below link https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java Here it reading data from HDFS file system but in our case i need to read from mongodb. I have tried it earlier and now again tried it b

Re: Slow Mongo Read from Spark

2015-08-31 Thread Akhil Das
Here's a piece of code which works well for us (spark 1.4.1) Configuration bsonDataConfig = new Configuration(); bsonDataConfig.set("mongo.job.input.format", "com.mongodb.hadoop.BSONFileInputFormat"); Configuration predictionsConfig = new Configuration(); predictio

Re: Slow Mongo Read from Spark

2015-08-31 Thread Akhil Das
Can you try with these key value classes and see the performance? inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat" keyClassName = "org.apache.hadoop.io.Text" valueClassName = "org.apache.hadoop.io.MapWritable" Taken from databricks blog