Sorry; I think I may have used poor wording. SparkR will let you use R to analyze the data, but it has to be loaded into memory using SparkR (see SparkR DataSources <http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>). You will still have to write a Java receiver to store the data into some tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and performing the analysis.
R specific questions such as windowing in R should go to R-help@; you won't be able to use window since that is a Spark Streaming method. On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote: > You are helping me understanding stuff here a lot. > > I believe I have 3 last questions.. > > If is use java receiver to get the data, how should I save it in memory? > Using store command or other command? > > Once stored, how R can read that data? > > Can I use window command in R? I guess not because it is a streaming > command, right? Any other way to window the data? > > Sent from IPhone > > > > > On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" < > fli...@databricks.com> wrote: > > If you use SparkR then you can analyze the data that's currently in >> memory with R; otherwise you will have to write to disk (eg HDFS). >> >> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote: >> >>> Thanks again. >>> What I'm missing is where can I store the data? Can I store it in spark >>> memory and then use R to analyze it? Or should I use hdfs? Any other places >>> that I can save the data? >>> >>> What would you suggest? >>> >>> Thanks... >>> >>> Sent from IPhone >>> >>> >>> >>> >>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" < >>> fli...@databricks.com> wrote: >>> >>> If you don't require true streaming processing and need to use R for >>>> analysis, SparkR on a custom data source seems to fit your use case. >>>> >>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com> wrote: >>>> >>>>> Hi, thanks for replying! >>>>> I want to do the entire process in stages. Get the data using Java or >>>>> scala because they are the only Langs that supports custom receivers, keep >>>>> the data <somewhere>, use R to analyze it, keep the results <somewhere>, >>>>> output the data to different systems. >>>>> >>>>> I thought that <somewhere> can be spark memory using rdd or dstreams.. >>>>> But could it be that I need to keep it in hdfs to make the entire process >>>>> in stages? >>>>> >>>>> Sent from IPhone >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" < >>>>> fli...@databricks.com> wrote: >>>>> >>>>> Hi Oded, >>>>>> >>>>>> I'm not sure I completely understand your question, but it sounds >>>>>> like you could have the READER receiver produce a DStream which is >>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT. >>>>>> However, streaming in SparkR is not currently supported (SPARK-6803 >>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too >>>>>> sure how ANALYZER would fit in. >>>>>> >>>>>> Feynman >>>>>> >>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com> >>>>>> wrote: >>>>>> >>>>>>> any help / idea will be appreciated :) >>>>>>> thanks >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Oded Maimon >>>>>>> Scene53. >>>>>>> >>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> we are evaluating spark for real-time analytic. what we are trying >>>>>>>> to do is the following: >>>>>>>> >>>>>>>> - READER APP- use custom receiver to get data from rabbitmq >>>>>>>> (written in scala) >>>>>>>> - ANALYZER APP - use spark R application to read the data >>>>>>>> (windowed), analyze it every minute and save the results inside >>>>>>>> spark >>>>>>>> - OUTPUT APP - user spark application (scala/java/python) to >>>>>>>> read the results from R every X minutes and send the data to few >>>>>>>> external >>>>>>>> systems >>>>>>>> >>>>>>>> basically at the end i would like to have the READER COMPONENT as >>>>>>>> an app that always consumes the data and keeps it in spark, >>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and >>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it to >>>>>>>> any >>>>>>>> relevant system. >>>>>>>> >>>>>>>> what is the right way to do it? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Oded. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> *This email and any files transmitted with it are confidential and >>>>>>> intended solely for the use of the individual or entity to whom they are >>>>>>> addressed. Please note that any disclosure, copying or distribution of >>>>>>> the >>>>>>> content of this information is strictly forbidden. If you have received >>>>>>> this email message in error, please destroy it immediately and notify >>>>>>> its >>>>>>> sender.* >>>>>>> >>>>>> >>>>>> >>>>> *This email and any files transmitted with it are confidential and >>>>> intended solely for the use of the individual or entity to whom they are >>>>> addressed. Please note that any disclosure, copying or distribution of the >>>>> content of this information is strictly forbidden. If you have received >>>>> this email message in error, please destroy it immediately and notify its >>>>> sender.* >>>>> >>>> >>>> >>> *This email and any files transmitted with it are confidential and >>> intended solely for the use of the individual or entity to whom they are >>> addressed. Please note that any disclosure, copying or distribution of the >>> content of this information is strictly forbidden. If you have received >>> this email message in error, please destroy it immediately and notify its >>> sender.* >>> >> >> > *This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they are > addressed. Please note that any disclosure, copying or distribution of the > content of this information is strictly forbidden. If you have received > this email message in error, please destroy it immediately and notify its > sender.* >