Hi Spark Experts I have a customer who wants to monitor coming data files (with xml format), and then analysize them after that put analysized data into DB. The size of each file is about 30MB (or even less in future). Spark streaming seems promising.
After learning Spark Streaming and also google-ing how Spark Streaming handle xml files, I found there seems no existing Spark Stream utility to recognize whole xml file and parse it. The fileStream seems line-oriented. There is suggestion of putting whole xml file into one line, however it requires pre-processing files which will bring unexpected I/O. Can anyone throw some light on it? If will be great if there are some sample codes for me to start with. Thanks Yong