All Can someone please shed some light on the above query? Any help is greatly appreciated.
Thanks, Girish Vasmatkar HotWax Systems On Thu, Oct 4, 2018 at 10:25 AM Girish Vasmatkar < girish.vasmat...@hotwaxsystems.com> wrote: > > > On Mon, Oct 1, 2018 at 12:18 PM Girish Vasmatkar < > girish.vasmat...@hotwaxsystems.com> wrote: > >> Hi All >> >> We are very early into our Spark days so the following may sound like a >> novice question :) I will try to keep this as short as possible. >> >> We are trying to use Spark to introduce a recommendation engine that can >> be used to provide product recommendations and need help on some design >> decisions before moving forward. Ours is a web application running on >> Tomcat. So far, I have created a simple POC (standalone java program) that >> reads in a CSV file and feeds to FPGrowth and then fits the data and runs >> transformations. I would like to be able to do the following - >> >> >> - Scheduler runs nightly in Tomcat (which it does currently) and >> reads everything from the DB to train/fit the system. This can grow into >> really some large data and everyday we will have new data. Should I just >> use SparkContext here, within my scheduler, to FIT the system? Is this >> correct way to go about this? I am also planning to save the model on S3 >> which should be okay. We also thought on using HDFS. The scheduler's job >> will be just to create model and save the same and be done with it. >> - On the product page, we can then use the saved model to display the >> product recommendations for a particular product. >> - My understanding is that I should be able to use SparkContext here >> in my web application to just load the saved model and use it to derive >> the >> recommendations. Is this a good design? The problem I see using this >> approach is that the SparkContext does take time to initialize and this >> may >> cost dearly. Or should we keep SparkContext per web application to use a >> single instance of the same? We can initialize a SparkContext during >> application context initializaion phase. >> >> >> Since I am fairly new to using Spark properly, please help me take >> decision on whether the way I plan to use Spark is the recommended way? I >> have also seen use cases involving kafka tha does communication with Spark, >> but can we not do it directly using Spark Context? I am sure a lot of my >> understanding is wrong, so please feel free to correct me. >> >> Thanks and Regards, >> Girish Vasmatkar >> HotWax Systems >> >> >> >>