this is an example of one revison for page (in other case is more complex but it's possible):
REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar; DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader(); DEFINE RegexExtractAll org.apache.pig.piggybank.evaluation.string.RegexExtractAll(); revisionXML = LOAD 'Revision.xml' USING XMLLoader('page') AS (revision:chararray); rev = FOREACH revisionXML GENERATE FLATTEN (RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>') ) AS ( page: chararray, id_revision: chararray, username: chararray, ); dump rev; 2012/5/17, Herbert Mühlburger <herbert.muehlbur...@gmail.com>: > Hi list, > > I would like to parse the following XML-File using Pig: > > <page> > <id>1</id> > <revision> > <id>1</id> > <username>muehlburger</username> > </revision> > <revision> > <id>2</id> > <username>muehlburger</username> > </revision> > <revision> > <id>3</id> > <username>user1</username> > </revision> > ... > <revision> > <id>34334398</id> > <username>muehlburger</username> > </revision> > </page> > <page> > <id>2</id> > <revision> > <id>343434</id> > <username>muehlburger</username> > </revision> > <revision> > <id>25343232</id> > <username>muehlburger</username> > </revision> > <revision> > <id>43434333</id> > <username>user2</username> > </revision> > ... > <revision> > <id>5409589854</id> > <username>user5</username> > </revision> > </page> > ... > > I would like to produce the following kind of csv output: > > page_id revision_id username > 1 1 muehlburger > 1 2 muehlburger > 1 3 user1 > 1 34334398 muehlburger > 2 343434 muehlburger > 2 25343232 muehlburger > 2 43434333 user2 > 2 5409589854 user5 > > How can I acomplish this using PIG? > > Thank you very much for your help! > > Kind regards, > Herbert > -- > ================================================================= > Herbert Muehlburger Software Development and Business Management > Graz University of Technology > www.muehlburger.at www.twitter.com/hmuehlburger > ================================================================= >