Pipe separated value. I know broadcast and join works. but i would like to know mapDB works or not ?
*Thanks*, <https://in.linkedin.com/in/ramkumarcs31> On Tue, Dec 8, 2015 at 2:22 PM, Fengdong Yu <fengdo...@everstring.com> wrote: > > what’s your data format? ORC or CSV or others? > > val keys = sqlContext.read.orc(“your previous batch data > path”).select($”uniq_key”).collect > val broadCast = sc.broadCast(keys) > > val rdd = your_current_batch_data > rdd.filter( line => line.key not in broadCase.value) > > > > > > > On Dec 8, 2015, at 4:44 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote: > > Im running spark batch job in cluster mode every hour and it runs for 15 > minutes. I have certain unique keys in the dataset. i dont want to process > those keys during my next hour batch. > > *Thanks*, > <https://in.linkedin.com/in/ramkumarcs31> > > > On Tue, Dec 8, 2015 at 1:42 PM, Fengdong Yu <fengdo...@everstring.com> > wrote: > >> Can you detail your question? what looks like your previous batch and >> the current batch? >> >> >> >> >> >> On Dec 8, 2015, at 3:52 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote: >> >> Hi, >> >> I'm running java over spark in cluster mode. I want to apply filter on >> javaRDD based on some previous batch values. if i store those values in >> mapDB, is it possible to apply filter during the current batch ? >> >> *Thanks*, >> <https://in.linkedin.com/in/ramkumarcs31> >> >> >> > >