Re: Spark Function setup and cleanup

2014-07-26 Thread Sean Owen
Look at mapPartitions. Where as map turns one value V1 into one value V2, mapPartitions lets you turn one entire Iterator[V1] to one whole Iterator [V2]. The function that does so can perform some initialization at its start, and then process all of the values, and clean up at its end. This is how

Re: Spark Function setup and cleanup

2014-07-26 Thread Yosi Botzer
Thank you, but that doesn't answer my general question. I might need to enrich my records using different datasources (or DB's) So the general use case I need to support is to have some kind of Function that has init() logic for creating connection to DB, query the DB for each records and enrich

Re: Spark Function setup and cleanup

2014-07-24 Thread Yanbo Liang
You can refer this topic http://www.mapr.com/developercentral/code/loading-hbase-tables-spark 2014-07-24 22:32 GMT+08:00 Yosi Botzer : > In my case I want to reach HBase. For every record with userId I want to > get some extra information about the user and add it to result record for > further

Re: Spark Function setup and cleanup

2014-07-24 Thread Yosi Botzer
In my case I want to reach HBase. For every record with userId I want to get some extra information about the user and add it to result record for further prcessing On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang wrote: > If you want to connect to DB in program, you can use JdbcRDD ( > https://git

Re: Spark Function setup and cleanup

2014-07-24 Thread Yanbo Liang
If you want to connect to DB in program, you can use JdbcRDD ( https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ) 2014-07-24 18:32 GMT+08:00 Yosi Botzer : > Hi, > > I am using the Java api of Spark. > > I wanted to know if there is a way to run s

Spark Function setup and cleanup

2014-07-24 Thread Yosi Botzer
Hi, I am using the Java api of Spark. I wanted to know if there is a way to run some code in a manner that is like the setup() and cleanup() methods of Hadoop Map/Reduce The reason I need it is because I want to read something from the DB according to each record I scan in my Function, and I wou