I walked through the example in the second link you gave. The Treasury
Yield example referred there is
here<https://github.com/mongodb/mongo-hadoop/blob/master/examples/treasury_yield/src/main/java/com/mongodb/hadoop/examples/treasury/TreasuryYieldXMLConfigV2.java>.
Note the InputFormat and OutputFormat used in the job configuration. This
InputFormat and OutputFormat specifies how to write data in and out of
MongoDB. You should be able to use the same InputFormat and outputFormat
class in Spark as well. For saving files to MongoDB, use
yourRDD.saveAsHadoopFile(.... specify the output format class ...)  and to
read from MongoDB  sparkContext.hadoopFile(..... specify input format class
....) .

TD


On Thu, Jan 30, 2014 at 12:36 PM, Sampo Niskanen
<sampo.niska...@wellmo.com>wrote:

> Hi,
>
> We're starting to build an analytics framework for our wellness service.
>  While our data is not yet Big, we'd like to use a framework that will
> scale as needed, and Spark seems to be the best around.
>
> I'm new to Hadoop and Spark, and I'm having difficulty figuring out how to
> use Spark in connection with MongoDB.  Apparently, I should be able to use
> the mongo-hadoop connector (https://github.com/mongodb/mongo-hadoop) also
> with Spark, but haven't figured out how.
>
> I've run through the Spark tutorials and been able to setup a
> single-machine Hadoop system with the MongoDB connector as instructed at
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> and
> http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/
>
> Could someone give some instructions or pointers on how to configure and
> use the mongo-hadoop connector with Spark?  I haven't been able to find any
> documentation about this.
>
>
> Thanks.
>
>
> Best regards,
>    Sampo N.
>
>
>

Reply via email to