Why do you need to use stream in this use case? 50g need not to be in
memory. Give it a try with high number of partitions.
On 11 Jun 2015 23:09, "SLiZn Liu" <sliznmail...@gmail.com> wrote:

> Hi Spark Users,
>
> I'm trying to load a literally big file (50GB when compressed as gzip
> file, stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as
> this file cannot be fitted in my memory. However, it looks like no RDD will
> be received until I copy this big file to a prior-specified location on
> HDFS. Ideally, I'd like read this file by a small number of lines at a
> time, but receiving a file stream requires additional writing to HDFS. Any
> idea to achieve this?
>
> BEST REGARDS,
> Todd Leo
>

Reply via email to