Hi Spark Users,

I'm trying to load a literally big file (50GB when compressed as gzip file,
stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this
file cannot be fitted in my memory. However, it looks like no RDD will be
received until I copy this big file to a prior-specified location on HDFS.
Ideally, I'd like read this file by a small number of lines at a time, but
receiving a file stream requires additional writing to HDFS. Any idea to
achieve this?

BEST REGARDS,
Todd Leo

Reply via email to