Thanks - any chance of contributing some of that code? :) I have thought of a similar approach: starting with an XMLToPig EvalFunc that takes the output of the existing XMLLoader and converts it to tuple/bag/map form. Easier to baby step that, just a matter of plugging that code in to the xml slice trimmed by XMLLoader, and much easier once the EvalFunc works.
Russell Jurney http://datasyndrome.com On Dec 24, 2012, at 12:10 AM, Vitalii Tymchyshyn <tiv...@gmail.com> wrote: > I was doing such a thing in my previous project, but I did parse on demand. > What I mean is that I've created set of xml-processing functions, each can > take a string or Dom on input plus explicit parse function. > I did this because I was usually using concatenation/grouping on parsed > input files and processing was done only after that. Or processing can be > done in another MR step and serialization of string is much easier than of > Dom. > 24 груд. 2012 09:24, "Russell Jurney" <russell.jur...@gmail.com> напис. > >> I want to extend the existing XMLLoader to go beyond capturing the text >> inside a tag and to actually create a Pig mapping of the Document Object >> Model the XML represents. This would be similar to elephant-bird's >> JsonLoader. >> >> For instance, check this example: https://gist.github.com/4368194 >> >> Semi-structured data can vary, so this behavior can be risky but... I want >> people to be able to load JSON and XML data easily their first session with >> Pig. >> >> Russell Jurney http://datasyndrome.com >>