Hi Spark Users, I'm trying to load a literally big file (50GB when compressed as gzip file, stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this file cannot be fitted in my memory. However, it looks like no RDD will be received until I copy this big file to a prior-specified location on HDFS. Ideally, I'd like read this file by a small number of lines at a time, but receiving a file stream requires additional writing to HDFS. Any idea to achieve this?
BEST REGARDS, Todd Leo