Re: Few basic spark questions

Feynman Liang Mon, 13 Jul 2015 14:36:10 -0700

Sorry; I think I may have used poor wording. SparkR will let you use R to
analyze the data, but it has to be loaded into memory using SparkR (see SparkR
DataSources
<http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>).
You will still have to write a Java receiver to store the data into some
tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and
performing the analysis.


R specific questions such as windowing in R should go to R-help@; you won't
be able to use window since that is a Spark Streaming method.

On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote:

> You are helping me understanding stuff here a lot.
>
> I believe I have 3 last questions..
>
> If is use java receiver to get the data, how should I save it in memory?
> Using store command or other command?
>
> Once stored, how R can read that data?
>
> Can I use window command in R? I guess not because it is a streaming
> command, right? Any other way to window the data?
>
> Sent from IPhone
>
>
>
>
> On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" <
> fli...@databricks.com> wrote:
>
>  If you use SparkR then you can analyze the data that's currently in
>> memory with R; otherwise you will have to write to disk (eg HDFS).
>>
>> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote:
>>
>>> Thanks again.
>>> What I'm missing is where can I store the data? Can I store it in spark
>>> memory and then use R to analyze it? Or should I use hdfs? Any other places
>>> that I can save the data?
>>>
>>> What would you suggest?
>>>
>>> Thanks...
>>>
>>> Sent from IPhone
>>>
>>>
>>>
>>>
>>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" <
>>> fli...@databricks.com> wrote:
>>>
>>>  If you don't require true streaming processing and need to use R for
>>>> analysis, SparkR on a custom data source seems to fit your use case.
>>>>
>>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com> wrote:
>>>>
>>>>> Hi, thanks for replying!
>>>>> I want to do the entire process in stages. Get the data using Java or
>>>>> scala because they are the only Langs that supports custom receivers, keep
>>>>> the data <somewhere>, use R to analyze it, keep the results <somewhere>,
>>>>> output the data to different systems.
>>>>>
>>>>> I thought that <somewhere> can be spark memory using rdd or dstreams..
>>>>> But could it be that I need to keep it in hdfs to make the entire process
>>>>> in stages?
>>>>>
>>>>> Sent from IPhone
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" <
>>>>> fli...@databricks.com> wrote:
>>>>>
>>>>>  Hi Oded,
>>>>>>
>>>>>> I'm not sure I completely understand your question, but it sounds
>>>>>> like you could have the READER receiver produce a DStream which is
>>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT.
>>>>>> However, streaming in SparkR is not currently supported (SPARK-6803
>>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too
>>>>>> sure how ANALYZER would fit in.
>>>>>>
>>>>>> Feynman
>>>>>>
>>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com>
>>>>>> wrote:
>>>>>>
>>>>>>> any help / idea will be appreciated :)
>>>>>>> thanks
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Oded Maimon
>>>>>>> Scene53.
>>>>>>>
>>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>> we are evaluating spark for real-time analytic. what we are trying
>>>>>>>> to do is the following:
>>>>>>>>
>>>>>>>>    - READER APP- use custom receiver to get data from rabbitmq
>>>>>>>>    (written in scala)
>>>>>>>>    - ANALYZER APP - use spark R application to read the data
>>>>>>>>    (windowed), analyze it every minute and save the results inside 
>>>>>>>> spark
>>>>>>>>    - OUTPUT APP - user spark application (scala/java/python) to
>>>>>>>>    read the results from R every X minutes and send the data to few 
>>>>>>>> external
>>>>>>>>    systems
>>>>>>>>
>>>>>>>> basically at the end i would like to have the READER COMPONENT as
>>>>>>>> an app that always consumes the data and keeps it in spark,
>>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and
>>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it to 
>>>>>>>> any
>>>>>>>> relevant system.
>>>>>>>>
>>>>>>>> what is the right way to do it?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Oded.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> *This email and any files transmitted with it are confidential and
>>>>>>> intended solely for the use of the individual or entity to whom they are
>>>>>>> addressed. Please note that any disclosure, copying or distribution of 
>>>>>>> the
>>>>>>> content of this information is strictly forbidden. If you have received
>>>>>>> this email message in error, please destroy it immediately and notify 
>>>>>>> its
>>>>>>> sender.*
>>>>>>>
>>>>>>
>>>>>>
>>>>> *This email and any files transmitted with it are confidential and
>>>>> intended solely for the use of the individual or entity to whom they are
>>>>> addressed. Please note that any disclosure, copying or distribution of the
>>>>> content of this information is strictly forbidden. If you have received
>>>>> this email message in error, please destroy it immediately and notify its
>>>>> sender.*
>>>>>
>>>>
>>>>
>>> *This email and any files transmitted with it are confidential and
>>> intended solely for the use of the individual or entity to whom they are
>>> addressed. Please note that any disclosure, copying or distribution of the
>>> content of this information is strictly forbidden. If you have received
>>> this email message in error, please destroy it immediately and notify its
>>> sender.*
>>>
>>
>>
> *This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. Please note that any disclosure, copying or distribution of the
> content of this information is strictly forbidden. If you have received
> this email message in error, please destroy it immediately and notify its
> sender.*
>

Re: Few basic spark questions

Reply via email to