Hi Gutierrez , As suggest i tried with the code , but in the result.txt i got output only header. Nothing else was printing.
After debugging i came to know that while parsing , there is no value. The problem is in line given below which is bold. While putting SysOut i found no value printing in this line. String xmlContent = value.toString(); InputStream is = new ByteArrayInputStream(xmlContent.getBytes()); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder; try { builder = factory.newDocumentBuilder(); * Document doc = builder.parse(is);* String ed=doc.getDocumentElement().getNodeName(); out.write(ed.getBytes()); DTMNodeList list = (DTMNodeList) getNode("/Company/Employee", doc,XPathConstants.NODESET); When iam printing out.write(xmlContent.getBytes):- the whole xml is being printed. then i wrote for Sysout for list ,nothing printed. out.write(ed.getBytes):- nothing is being printed. Please suggest where i am going wrong. Please help to fix this. Thanks in advance. I have attached my code.Please review. Mapper class:- public class XmlTextMapper extends Mapper<LongWritable, Text, Text, Text> { private static final XPathFactory xpathFactory = XPathFactory.newInstance(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String resultFileName = "/user/task/Sales/result.txt"; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(resultFileName), conf); FSDataOutputStream out = fs.create(new Path(resultFileName)); InputStream resultIS = new ByteArrayInputStream(new byte[0]); String header = "id,name\n"; out.write(header.getBytes()); String xmlContent = value.toString(); InputStream is = new ByteArrayInputStream(xmlContent.getBytes()); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder; try { builder = factory.newDocumentBuilder(); Document doc = builder.parse(is); String ed=doc.getDocumentElement().getNodeName(); out.write(ed.getBytes()); DTMNodeList list = (DTMNodeList) getNode("/Company/Employee", doc,XPathConstants.NODESET); int size = list.getLength(); for (int i = 0; i < size; i++) { Node node = list.item(i); String line = ""; NodeList nodeList = node.getChildNodes(); int childNumber = nodeList.getLength(); for (int j = 0; j < childNumber; j++) { line += nodeList.item(j).getTextContent() + ","; } if (line.endsWith(",")) line = line.substring(0, line.length() - 1); line += "\n"; out.write(line.getBytes()); } } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (XPathExpressionException e) { e.printStackTrace(); } IOUtils.copyBytes(resultIS, out, 4096, true); out.close(); } public static Object getNode(String xpathStr, Node node, QName retunType) throws XPathExpressionException { XPath xpath = xpathFactory.newXPath(); return xpath.evaluate(xpathStr, node, retunType); } } Main class public class MainXml { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); if (args.length != 2) { System.err .println("Usage: XMLtoText <input path> <output path>"); System.exit(-1); } String output="/user/task/Sales/"; Job job = new Job(conf, "XML to Text"); job.setJarByClass(MainXml.class); // job.setJobName("XML to Text"); FileInputFormat.addInputPath(job, new Path(args[0])); // FileOutputFormat.setOutputPath(job, new Path(args[1])); Path outPath = new Path(output); FileOutputFormat.setOutputPath(job, outPath); FileSystem dfs = FileSystem.get(outPath.toUri(), conf); if (dfs.exists(outPath)) { dfs.delete(outPath, true); } job.setMapperClass(XmlTextMapper.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } } My xml file <Company> <Employee> <id>100</id> <ename>ranjini</ename> <dept>IT1</dept> <sal>123456</sal> <location>nextlevel1</location> <Address> <Home>Chennai1</Home> <Office>Navallur1</Office> </Address> </Employee> <Employee> <id>1001</id> <ename>ranjinikumar</ename> <dept>IT</dept> <sal>1234516</sal> <location>nextlevel</location> <Address> <Home>Chennai</Home> <Office>Navallur</Office> </Address> </Employee> </Company> Thanks in advance. Ranjini > On Mon, Jan 6, 2014 at 2:44 PM, Ranjini Rathinam > <ranjinibe...@gmail.com>wrote: > >> Hi, >> >> Thanks a lot . >> >> Ranjini >> >> On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez < >> diego.gutier...@ucsp.edu.pe> wrote: >> >>> Hi, >>> >>> I suggest to use the XPath, this is a native java support for parse xml >>> and json formats. >>> >>> For the main problem, like distcp command( >>> http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no need of >>> a reduce function, because you can parse the xml input file and create the >>> file you need in the map function.For example the following code reads an >>> xml file in HDFS, parse it and create a new file ( "/result.txt" ) with the >>> expected format: >>> id,name >>> 100,RR >>> >>> >>> Mapper function: >>> >>> import java.io.ByteArrayInputStream; >>> import java.io.IOException; >>> import java.io.InputStream; >>> import java.net.URI; >>> >>> import javax.xml.namespace.QName; >>> import javax.xml.parsers.DocumentBuilder; >>> import javax.xml.parsers.DocumentBuilderFactory; >>> import javax.xml.parsers.ParserConfigurationException; >>> import javax.xml.xpath.XPath; >>> import javax.xml.xpath.XPathConstants; >>> import javax.xml.xpath.XPathExpressionException; >>> import javax.xml.xpath.XPathFactory; >>> >>> import org.apache.hadoop.conf.Configuration; >>> import org.apache.hadoop.fs.FSDataOutputStream; >>> import org.apache.hadoop.fs.FileSystem; >>> import org.apache.hadoop.fs.Path; >>> import org.apache.hadoop.io.IOUtils; >>> import org.apache.hadoop.io.LongWritable; >>> import org.apache.hadoop.io.Text; >>> import org.apache.hadoop.mapreduce.Mapper; >>> import org.w3c.dom.Document; >>> import org.w3c.dom.Node; >>> import org.w3c.dom.NodeList; >>> import org.xml.sax.SAXException; >>> >>> import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList; >>> >>> public class XmlToTextMapper extends Mapper<LongWritable, Text, Text, >>> Text> { >>> >>> private static final XPathFactory xpathFactory = >>> XPathFactory.newInstance(); >>> >>> @Override >>> public void map(LongWritable key, Text value, Context context) >>> throws IOException, InterruptedException { >>> >>> String resultFileName = "/result.txt"; >>> >>> >>> Configuration conf = new Configuration(); >>> FileSystem fs = FileSystem.get(URI.create(resultFileName), conf); >>> FSDataOutputStream out = fs.create(new Path(resultFileName)); >>> >>> InputStream resultIS = new ByteArrayInputStream(new byte[0]); >>> >>> String header = "id,name\n"; >>> out.write(header.getBytes()); >>> >>> String xmlContent = value.toString(); >>> InputStream is = new ByteArrayInputStream(xmlContent.getBytes()); >>> DocumentBuilderFactory factory = >>> DocumentBuilderFactory.newInstance(); >>> DocumentBuilder builder; >>> try { >>> builder = factory.newDocumentBuilder(); >>> Document doc = builder.parse(is); >>> DTMNodeList list = (DTMNodeList) getNode("/main/data", doc, >>> XPathConstants.NODESET); >>> >>> int size = list.getLength(); >>> for (int i = 0; i < size; i++) { >>> Node node = list.item(i); >>> String line = ""; >>> NodeList nodeList = node.getChildNodes(); >>> int childNumber = nodeList.getLength(); >>> for (int j = 0; j < childNumber; j++) { >>> line += nodeList.item(j).getTextContent() + ","; >>> } >>> if (line.endsWith(",")) >>> line = line.substring(0, line.length() - 1); >>> line += "\n"; >>> out.write(line.getBytes()); >>> >>> } >>> >>> } catch (ParserConfigurationException e) { >>> MyLogguer.log("error: " + e.getMessage()); >>> e.printStackTrace(); >>> } catch (SAXException e) { >>> MyLogguer.log("error: " + e.getMessage()); >>> e.printStackTrace(); >>> } catch (XPathExpressionException e) { >>> MyLogguer.log("error: " + e.getMessage()); >>> e.printStackTrace(); >>> } >>> >>> IOUtils.copyBytes(resultIS, out, 4096, true); >>> out.close(); >>> } >>> >>> public static Object getNode(String xpathStr, Node node, QName >>> retunType) >>> throws XPathExpressionException { >>> XPath xpath = xpathFactory.newXPath(); >>> return xpath.evaluate(xpathStr, node, retunType); >>> } >>> } >>> >>> >>> >>> -------------------------------------- >>> Main class: >>> >>> >>> public class Main { >>> >>> public static void main(String[] args) throws Exception { >>> >>> if (args.length != 2) { >>> System.err >>> .println("Usage: XMLtoText <input path> <output >>> path>"); >>> System.exit(-1); >>> } >>> >>> Job job = new Job(); >>> job.setJarByClass(Main.class); >>> job.setJobName("XML to Text"); >>> FileInputFormat.addInputPath(job, new Path(args[0])); >>> FileOutputFormat.setOutputPath(job, new Path(args[1])); >>> >>> job.setMapperClass(XmlToTextMapper.class); >>> job.setNumReduceTasks(0); >>> job.setMapOutputKeyClass(Text.class); >>> job.setMapOutputValueClass(Text.class); >>> System.exit(job.waitForCompletion(true) ? 0 : 1); >>> >>> } >>> } >>> >>> To execute the job you can use : >>> >>> bin/hadoop Main /data.xml /output. >>> >>> >>> Then you can use this to see result.txt file: >>> >>> hadoop fs -cat /result.txt >>> >>> >>> I'm using this xml as input: >>> >>> >>> <Comp><Emp><id>1</id><name>NameA</name></data><data><id>2</id><name>NameB</name></Emp></Comp> >>> >>> and the content in result.txt is like this: >>> >>> id,name >>> 1,NameA >>> 2,NameB >>> >>> >>> Hope this helps. >>> >>> >>> 2014/1/3 Ranjini Rathinam <ranjinibe...@gmail.com> >>> >>>> Hi, >>>> >>>> Need to convert XML into text using mapreduce. >>>> >>>> I have used DOM and SAX parser. >>>> >>>> After using SAX Builder in mapper class. the child node act as root >>>> Element. >>>> >>>> While seeing in Sys out i found thar root element is taking the child >>>> element and printing. >>>> >>>> For Eg, >>>> >>>> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp> >>>> when this xml is passed in mapper , in sys out printing the root element >>>> >>>> I am getting the the root element as >>>> >>>> <id> >>>> <name> >>>> >>>> Please suggest and help to fix this. >>>> >>>> I need to convert the xml into text using mapreduce code. Please >>>> provide with example. >>>> >>>> Required output is >>>> >>>> id,name >>>> 100,RR >>>> >>>> Please help. >>>> >>>> Thanks in advance, >>>> Ranjini R >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> >