Thank you, but that doesn't answer my general question.
I might need to enrich my records using different datasources (or DB's)
So the general use case I need to support is to have some kind of Function
that has init() logic for creating connection to DB, query the DB for each
records and enrich
Look at mapPartitions. Where as map turns one value V1 into one value
V2, mapPartitions lets you turn one entire Iterator[V1] to one whole
Iterator [V2]. The function that does so can perform some
initialization at its start, and then process all of the values, and
clean up at its end. This is how
Hi,
I am using the Java api of Spark.
I wanted to know if there is a way to run some code in a manner that is
like the setup() and cleanup() methods of Hadoop Map/Reduce
The reason I need it is because I want to read something from the DB
according to each record I scan in my Function, and I
If you want to connect to DB in program, you can use JdbcRDD (
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
)
2014-07-24 18:32 GMT+08:00 Yosi Botzer yosi.bot...@gmail.com:
Hi,
I am using the Java api of Spark.
I wanted to know if there
In my case I want to reach HBase. For every record with userId I want to
get some extra information about the user and add it to result record for
further prcessing
On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang yanboha...@gmail.com wrote:
If you want to connect to DB in program, you can use
You can refer this topic
http://www.mapr.com/developercentral/code/loading-hbase-tables-spark
2014-07-24 22:32 GMT+08:00 Yosi Botzer yosi.bot...@gmail.com:
In my case I want to reach HBase. For every record with userId I want to
get some extra information about the user and add it to result