Thanks Akhil I will have a try and then go back to you
Yong On Mon, Jun 22, 2015 at 8:25 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Like this? > > val rawXmls = ssc.fileStream(path, classOf[XmlInputFormat], > classOf[LongWritable], > classOf[Text]) > > > Thanks > Best Regards > > On Mon, Jun 22, 2015 at 5:45 PM, Yong Feng <fengyong...@gmail.com> wrote: > >> Thanks a lot, Akhil >> >> I saw this mail thread before, but still do not understand how to use >> XmlInputFormatof mahout in Spark Streaming (I am not Spark Streaming >> Expert yet ;-)). Can you show me some sample code for explanation. >> >> Thanks in advance, >> >> Yong >> >> On Mon, Jun 22, 2015 at 6:44 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> You can use fileStream for that, look at the XMLInputFormat >>> <https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java> >>> of mahout. It should give you full XML object as on record, (as opposed to >>> an XML record spread across multiple line records in textFileStream). Also >>> this >>> thread >>> <http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-a-large-XML-file-using-Spark-td19239.html> >>> has some discussion around it. >>> >>> Thanks >>> Best Regards >>> >>> On Mon, Jun 22, 2015 at 12:23 AM, Yong Feng <fengyong...@gmail.com> >>> wrote: >>> >>>> >>>> Hi Spark Experts >>>> >>>> I have a customer who wants to monitor coming data files (with xml >>>> format), and then analysize them after that put analysized data into DB. >>>> The size of each file is about 30MB (or even less in future). Spark >>>> streaming seems promising. >>>> >>>> After learning Spark Streaming and also google-ing how Spark Streaming >>>> handle xml files, I found there seems no existing Spark Stream utility to >>>> recognize whole xml file and parse it. The fileStream seems line-oriented. >>>> There is suggestion of putting whole xml file into one line, however it >>>> requires pre-processing files which will bring unexpected I/O. >>>> >>>> Can anyone throw some light on it? If will be great if there are some >>>> sample codes for me to start with. >>>> >>>> Thanks >>>> >>>> Yong >>>> >>>> >>> >> >