Reading Really Big File Stream from HDFS

SLiZn Liu Thu, 11 Jun 2015 06:09:27 -0700

Hi Spark Users,

I'm trying to load a literally big file (50GB when compressed as gzip file,
stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this
file cannot be fitted in my memory. However, it looks like no RDD will be
received until I copy this big file to a prior-specified location on HDFS.
Ideally, I'd like read this file by a small number of lines at a time, but
receiving a file stream requires additional writing to HDFS. Any idea to
achieve this?


BEST REGARDS,
Todd Leo

Reading Really Big File Stream from HDFS

Reply via email to