Re: Spark with MapDB

Ramkumar V Tue, 08 Dec 2015 01:02:50 -0800

Pipe separated value. I know broadcast and join works. but i would like to
know mapDB works or not ?


*Thanks*,
<https://in.linkedin.com/in/ramkumarcs31>


On Tue, Dec 8, 2015 at 2:22 PM, Fengdong Yu <fengdo...@everstring.com>
wrote:

>
> what’s your data format？ ORC or CSV or others?
>
> val keys = sqlContext.read.orc(“your previous batch data
> path”).select($”uniq_key”).collect
> val broadCast = sc.broadCast(keys)
>
> val rdd = your_current_batch_data
> rdd.filter( line => line.key  not in broadCase.value)
>
>
>
>
>
>
> On Dec 8, 2015, at 4:44 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote:
>
> Im running spark batch job in cluster mode every hour and it runs for 15
> minutes. I have certain unique keys in the dataset. i dont want to process
> those keys during my next hour batch.
>
> *Thanks*,
> <https://in.linkedin.com/in/ramkumarcs31>
>
>
> On Tue, Dec 8, 2015 at 1:42 PM, Fengdong Yu <fengdo...@everstring.com>
> wrote:
>
>> Can you detail your question?  what looks like your previous batch and
>> the current batch?
>>
>>
>>
>>
>>
>> On Dec 8, 2015, at 3:52 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm running java over spark in cluster mode. I want to apply filter on
>> javaRDD based on some previous batch values. if i store those values in
>> mapDB, is it possible to apply filter during the current batch ?
>>
>> *Thanks*,
>> <https://in.linkedin.com/in/ramkumarcs31>
>>
>>
>>
>
>

Re: Spark with MapDB

Reply via email to