Re: Historical Data as Stream

2014-05-17 Thread Mayur Rustagi
The real question is why are looking to consume file as a Stream 1. Too big to load as RDD 2. Operate in sequential manner. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Sat, May 17, 2014 at 5:12 AM, Soumya Simanta

Re: Historical Data as Stream

2014-05-17 Thread Laeeq Ahmed
@Soumya Simanta Right now its just a prove of concept. Later I will have a real stream. Its EEG files of brain. Later it can be used for real time analysis of eeg streams. @Mayur The size is huge yes. SO its better to do in distributed manner and as I said above I want to read as stream

Re: Historical Data as Stream

2014-05-17 Thread Soumya Simanta
@Laeeq - please see this example. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala#L47-L49 On Sat, May 17, 2014 at 2:06 PM, Laeeq Ahmed laeeqsp...@yahoo.com wrote: @Soumya Simanta Right now its just a prove of

Re: Historical Data as Stream

2014-05-16 Thread Soumya Simanta
File is just a steam with a fixed length. Usually streams don't end but in this case it would. On the other hand if you real your file as a steam may not be able to use the entire data in the file for your analysis. Spark (give enough memory) can process large amounts of data quickly. On

Historical Data as Stream

2014-05-16 Thread Laeeq Ahmed
Hi, I have data in a file. Can I read it as Stream in spark? I know it seems odd to read file as stream but it has practical applications in real life if I can read it as stream. It there any other tools which can give this file as stream to Spark or I have to make batches manually which is