This will not work i.e. using data frame inside map function.. Although you can try to create df separately n cache it... Then you can join your event stream with this df. On Jul 2, 2015 6:11 PM, "Ashish Soni" <asoni.le...@gmail.com> wrote:
> Hi All , > > I have and Stream of Event coming in and i want to fetch some additional > data from the database based on the values in the incoming data , For Eg > below is the data coming in > > loginName > Email > address > city > > Now for each login name i need to go to oracle database and get the userId > from the database *but i do not want to hit the database again and again > instead i want to load the complete table in memory and then find the user > id based on the incoming data* > > JavaRDD<Charge> rdd = > sc.textFile("/home/spark/workspace/data.csv").map(new Function<String, > String>() { > @Override > public Charge call(String s) { > String str[] = s.split(","); > * //How to load the complete table in memory and use it as > when i do outside the loop i get stage failure error * > * DataFrame dbRdd = > sqlContext.read().format("jdbc").options(options).load();* > > System.out.println(dbRdd.filter("ogin_nm='"+str[0]+"'").count()); > > return str[0] ; > } > }); > > > How i can achieve this , Please suggest > > Thanks >