Thank you, but that doesn't answer my general question.

I might need to enrich my records using different datasources (or DB's)

So the general use case I need to support is to have some kind of Function
that has init() logic for creating connection to DB, query the DB for each
records and enrich my input record with stuff from the DB, and use some
kind of close() logic to close the connection.

I have implemented this kind of use case using Map/Reduce and I want to
know how can I do it with spark

Thanks


On Fri, Jul 25, 2014 at 6:24 AM, Yanbo Liang <yanboha...@gmail.com> wrote:

> You can refer this topic
> http://www.mapr.com/developercentral/code/loading-hbase-tables-spark
>
>
> 2014-07-24 22:32 GMT+08:00 Yosi Botzer <yosi.bot...@gmail.com>:
>
> In my case I want to reach HBase. For every record with userId I want to
>> get some extra information about the user and add it to result record for
>> further prcessing
>>
>>
>> On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang <yanboha...@gmail.com>
>> wrote:
>>
>>> If you want to connect to DB in program, you can use JdbcRDD (
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
>>> )
>>>
>>>
>>> 2014-07-24 18:32 GMT+08:00 Yosi Botzer <yosi.bot...@gmail.com>:
>>>
>>> Hi,
>>>>
>>>> I am using the Java api of Spark.
>>>>
>>>> I wanted to know if there is a way to run some code in a manner that is
>>>> like the setup() and cleanup() methods of Hadoop Map/Reduce
>>>>
>>>> The reason I need it is because I want to read something from the DB
>>>> according to each record I scan in my Function, and I would like to open
>>>> the DB connection only once (and close it only once).
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Reply via email to