Yes, I agree but the data is in the form of RDD and also im running it
cluster mode so the data should be distributed across all machines in the
cluster. but if i use bloom filter or mapDB which is non distributed. How
will it works in this case ?

*Thanks*,
<https://in.linkedin.com/in/ramkumarcs31>


On Tue, Dec 8, 2015 at 5:30 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> You may want to use a bloom filter for this, but make sure that you
> understand how it works
>
> On 08 Dec 2015, at 09:44, Ramkumar V <ramkumar.c...@gmail.com> wrote:
>
> Im running spark batch job in cluster mode every hour and it runs for 15
> minutes. I have certain unique keys in the dataset. i dont want to process
> those keys during my next hour batch.
>
> *Thanks*,
> <https://in.linkedin.com/in/ramkumarcs31>
>
>
> On Tue, Dec 8, 2015 at 1:42 PM, Fengdong Yu <fengdo...@everstring.com>
> wrote:
>
>> Can you detail your question?  what looks like your previous batch and
>> the current batch?
>>
>>
>>
>>
>>
>> On Dec 8, 2015, at 3:52 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm running java over spark in cluster mode. I want to apply filter on
>> javaRDD based on some previous batch values. if i store those values in
>> mapDB, is it possible to apply filter during the current batch ?
>>
>> *Thanks*,
>> <https://in.linkedin.com/in/ramkumarcs31>
>>
>>
>>
>

Reply via email to