> > Please help to convert this xml to text. >> >> >> I have the attached the xml. Please find the attachement. >> >> Some student has two address tag and some student has one address tag and >> some student dont have address tag tag. >> >> I need to convert the xml into string. >> >> this is my desired output. >> >> 100,ranjini,HOME,a street,ad street,ads street,chennai,tn,OFFICE,adsja1 >> street,adsja2 street,adsja3 street,mumbai,Maharastra >> 101,nivetha,HOME,a street,ad street,ads street,chennai,tn >> 102,siva >> >> >> In normal java i have written using recursion but how to write in >> mapreduce. >> >> How to write the code in Mapreduce .? Pl help . >> >> Thanks in advance. >> Regards, >> Ranjini R >> >> >> On Fri, Jan 10, 2014 at 12:47 PM, Ranjini Rathinam < >> ranjinibe...@gmail.com> wrote: >> >>> Hi, >>> >>> Its working fine. problem was in xml . THe space i have given. >>> >>> Thanks a lot. >>> >>> Regards, >>> Ranjini.R >>> >>> On Thu, Jan 9, 2014 at 10:47 PM, Diego Gutierrez < >>> diego.gutier...@ucsp.edu.pe> wrote: >>> >>>> Hi, >>>> >>>> I'm sending you the eclipse project with the code. Hope this helps. >>>> >>>> Regards >>>> Diego GutiƩrrez >>>> >>>> >>>> >>>> 2014/1/9 Ranjini Rathinam <ranjinibe...@gmail.com> >>>> >>>>> Hi, >>>>> >>>>> I am using here java 1.6 and hadoop 0.20 version , ubuntu 12.04. >>>>> >>>>> If possible please send the jar and code for review. >>>>> >>>>> Thanks for the support, >>>>> >>>>> Ranjini >>>>> >>>>> On Wed, Jan 8, 2014 at 11:00 PM, Diego Gutierrez < >>>>> diego.gutier...@ucsp.edu.pe> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've notice that your xml file has break lines. Hadoop by default >>>>>> splits every file into lines and pass them to the map function, in other >>>>>> words, each map function process one line of the file. Please remove the >>>>>> break lines from your xml and try again. I've tested here with your xml >>>>>> file(just changing DTMNodeList list = (DTMNodeList) >>>>>> getNode("/Company/Employee", doc, >>>>>> XPathConstants.NODESET) ) and this is the output >>>>>> in result.txt >>>>>> >>>>>> >>>>>> id,name >>>>>> 100,ranjini,IT1,123456,nextlevel1,Chennai1Navallur1 >>>>>> 1001,ranjinikumar,IT,1234516,nextlevel,ChennaiNavallur >>>>>> >>>>>> >>>>>> Note: I dont know if the java version or hadoop version can be the >>>>>> problem here. I'm using ubuntu 12.04, java oracle 7 and hadoop 2.2.0. >>>>>> >>>>>> >>>>>> If you want, I can send you the jar file with the code :) >>>>>> >>>>>> Regards >>>>>> Diego GutiƩrrez. >>>>>> >>>>>> >>>>>> >>>>>> 2014/1/7 Ranjini Rathinam <ranjinibe...@gmail.com> >>>>>> >>>>>>> Hi Gutierrez , >>>>>>> >>>>>>> As suggest i tried with the code , but in the result.txt i got >>>>>>> output only header. Nothing else was printing. >>>>>>> >>>>>>> After debugging i came to know that while parsing , there is no >>>>>>> value. >>>>>>> >>>>>>> The problem is in line given below which is bold. While putting >>>>>>> SysOut i found no value printing in this line. >>>>>>> >>>>>>> String xmlContent = value.toString(); >>>>>>> >>>>>>> InputStream is = new >>>>>>> ByteArrayInputStream(xmlContent.getBytes()); >>>>>>> DocumentBuilderFactory factory = >>>>>>> DocumentBuilderFactory.newInstance(); >>>>>>> DocumentBuilder builder; >>>>>>> try { >>>>>>> builder = factory.newDocumentBuilder(); >>>>>>> >>>>>>> * Document doc = builder.parse(is);* >>>>>>> String ed=doc.getDocumentElement().getNodeName(); >>>>>>> out.write(ed.getBytes()); >>>>>>> DTMNodeList list = (DTMNodeList) >>>>>>> getNode("/Company/Employee", doc,XPathConstants.NODESET); >>>>>>> >>>>>>> When iam printing >>>>>>> >>>>>>> out.write(xmlContent.getBytes):- the whole xml is being printed. >>>>>>> >>>>>>> then i wrote for Sysout for list ,nothing printed. >>>>>>> out.write(ed.getBytes):- nothing is being printed. >>>>>>> >>>>>>> Please suggest where i am going wrong. Please help to fix this. >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> I have attached my code.Please review. >>>>>>> >>>>>>> >>>>>>> Mapper class:- >>>>>>> >>>>>>> public class XmlTextMapper extends Mapper<LongWritable, Text, Text, >>>>>>> Text> { >>>>>>> private static final XPathFactory xpathFactory = >>>>>>> XPathFactory.newInstance(); >>>>>>> @Override >>>>>>> public void map(LongWritable key, Text value, Context context) >>>>>>> throws IOException, InterruptedException { >>>>>>> String resultFileName = "/user/task/Sales/result.txt"; >>>>>>> >>>>>>> Configuration conf = new Configuration(); >>>>>>> FileSystem fs = FileSystem.get(URI.create(resultFileName), >>>>>>> conf); >>>>>>> FSDataOutputStream out = fs.create(new Path(resultFileName)); >>>>>>> InputStream resultIS = new ByteArrayInputStream(new byte[0]); >>>>>>> String header = "id,name\n"; >>>>>>> out.write(header.getBytes()); >>>>>>> String xmlContent = value.toString(); >>>>>>> >>>>>>> InputStream is = new >>>>>>> ByteArrayInputStream(xmlContent.getBytes()); >>>>>>> DocumentBuilderFactory factory = >>>>>>> DocumentBuilderFactory.newInstance(); >>>>>>> DocumentBuilder builder; >>>>>>> try { >>>>>>> builder = factory.newDocumentBuilder(); >>>>>>> Document doc = builder.parse(is); >>>>>>> String ed=doc.getDocumentElement().getNodeName(); >>>>>>> out.write(ed.getBytes()); >>>>>>> DTMNodeList list = (DTMNodeList) >>>>>>> getNode("/Company/Employee", doc,XPathConstants.NODESET); >>>>>>> int size = list.getLength(); >>>>>>> for (int i = 0; i < size; i++) { >>>>>>> Node node = list.item(i); >>>>>>> String line = ""; >>>>>>> NodeList nodeList = node.getChildNodes(); >>>>>>> int childNumber = nodeList.getLength(); >>>>>>> for (int j = 0; j < childNumber; j++) >>>>>>> { >>>>>>> line += nodeList.item(j).getTextContent() + ","; >>>>>>> } >>>>>>> if (line.endsWith(",")) >>>>>>> line = line.substring(0, line.length() - 1); >>>>>>> line += "\n"; >>>>>>> out.write(line.getBytes()); >>>>>>> } >>>>>>> } catch (ParserConfigurationException e) { >>>>>>> e.printStackTrace(); >>>>>>> } catch (SAXException e) { >>>>>>> e.printStackTrace(); >>>>>>> } catch (XPathExpressionException e) { >>>>>>> e.printStackTrace(); >>>>>>> } >>>>>>> IOUtils.copyBytes(resultIS, out, 4096, true); >>>>>>> out.close(); >>>>>>> } >>>>>>> public static Object getNode(String xpathStr, Node node, QName >>>>>>> retunType) >>>>>>> throws XPathExpressionException { >>>>>>> XPath xpath = xpathFactory.newXPath(); >>>>>>> return xpath.evaluate(xpathStr, node, retunType); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> Main class >>>>>>> public class MainXml { >>>>>>> public static void main(String[] args) throws Exception { >>>>>>> Configuration conf = new Configuration(); >>>>>>> if (args.length != 2) { >>>>>>> System.err >>>>>>> .println("Usage: XMLtoText <input path> <output >>>>>>> path>"); >>>>>>> System.exit(-1); >>>>>>> } >>>>>>> String output="/user/task/Sales/"; >>>>>>> Job job = new Job(conf, "XML to Text"); >>>>>>> job.setJarByClass(MainXml.class); >>>>>>> // job.setJobName("XML to Text"); >>>>>>> >>>>>>> FileInputFormat.addInputPath(job, new Path(args[0])); >>>>>>> // FileOutputFormat.setOutputPath(job, new Path(args[1])); >>>>>>> Path outPath = new Path(output); >>>>>>> FileOutputFormat.setOutputPath(job, outPath); >>>>>>> FileSystem dfs = FileSystem.get(outPath.toUri(), conf); >>>>>>> if (dfs.exists(outPath)) { >>>>>>> dfs.delete(outPath, true); >>>>>>> } >>>>>>> job.setMapperClass(XmlTextMapper.class); >>>>>>> >>>>>>> job.setNumReduceTasks(0); >>>>>>> job.setMapOutputKeyClass(Text.class); >>>>>>> job.setMapOutputValueClass(Text.class); >>>>>>> System.exit(job.waitForCompletion(true) ? 0 : 1); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> My xml file >>>>>>> >>>>>>> <Company> >>>>>>> <Employee> >>>>>>> <id>100</id> >>>>>>> <ename>ranjini</ename> >>>>>>> <dept>IT1</dept> >>>>>>> <sal>123456</sal> >>>>>>> <location>nextlevel1</location> >>>>>>> <Address> >>>>>>> <Home>Chennai1</Home> >>>>>>> <Office>Navallur1</Office> >>>>>>> </Address> >>>>>>> </Employee> >>>>>>> <Employee> >>>>>>> <id>1001</id> >>>>>>> <ename>ranjinikumar</ename> >>>>>>> <dept>IT</dept> >>>>>>> <sal>1234516</sal> >>>>>>> <location>nextlevel</location> >>>>>>> <Address> >>>>>>> <Home>Chennai</Home> >>>>>>> <Office>Navallur</Office> >>>>>>> </Address> >>>>>>> </Employee> >>>>>>> </Company> >>>>>>> >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> Ranjini >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Mon, Jan 6, 2014 at 2:44 PM, Ranjini Rathinam < >>>>>>>> ranjinibe...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Thanks a lot . >>>>>>>>> >>>>>>>>> Ranjini >>>>>>>>> >>>>>>>>> On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez < >>>>>>>>> diego.gutier...@ucsp.edu.pe> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I suggest to use the XPath, this is a native java support for >>>>>>>>>> parse xml and json formats. >>>>>>>>>> >>>>>>>>>> For the main problem, like distcp command( >>>>>>>>>> http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no >>>>>>>>>> need of a reduce function, because you can parse the xml input file >>>>>>>>>> and >>>>>>>>>> create the file you need in the map function.For example the >>>>>>>>>> following code >>>>>>>>>> reads an xml file in HDFS, parse it and create a new file ( >>>>>>>>>> "/result.txt" ) >>>>>>>>>> with the expected format: >>>>>>>>>> id,name >>>>>>>>>> 100,RR >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Mapper function: >>>>>>>>>> >>>>>>>>>> import java.io.ByteArrayInputStream; >>>>>>>>>> import java.io.IOException; >>>>>>>>>> import java.io.InputStream; >>>>>>>>>> import java.net.URI; >>>>>>>>>> >>>>>>>>>> import javax.xml.namespace.QName; >>>>>>>>>> import javax.xml.parsers.DocumentBuilder; >>>>>>>>>> import javax.xml.parsers.DocumentBuilderFactory; >>>>>>>>>> import javax.xml.parsers.ParserConfigurationException; >>>>>>>>>> import javax.xml.xpath.XPath; >>>>>>>>>> import javax.xml.xpath.XPathConstants; >>>>>>>>>> import javax.xml.xpath.XPathExpressionException; >>>>>>>>>> import javax.xml.xpath.XPathFactory; >>>>>>>>>> >>>>>>>>>> import org.apache.hadoop.conf.Configuration; >>>>>>>>>> import org.apache.hadoop.fs.FSDataOutputStream; >>>>>>>>>> import org.apache.hadoop.fs.FileSystem; >>>>>>>>>> import org.apache.hadoop.fs.Path; >>>>>>>>>> import org.apache.hadoop.io.IOUtils; >>>>>>>>>> import org.apache.hadoop.io.LongWritable; >>>>>>>>>> import org.apache.hadoop.io.Text; >>>>>>>>>> import org.apache.hadoop.mapreduce.Mapper; >>>>>>>>>> import org.w3c.dom.Document; >>>>>>>>>> import org.w3c.dom.Node; >>>>>>>>>> import org.w3c.dom.NodeList; >>>>>>>>>> import org.xml.sax.SAXException; >>>>>>>>>> >>>>>>>>>> import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList; >>>>>>>>>> >>>>>>>>>> public class XmlToTextMapper extends Mapper<LongWritable, Text, >>>>>>>>>> Text, Text> { >>>>>>>>>> >>>>>>>>>> private static final XPathFactory xpathFactory = >>>>>>>>>> XPathFactory.newInstance(); >>>>>>>>>> >>>>>>>>>> @Override >>>>>>>>>> public void map(LongWritable key, Text value, Context context) >>>>>>>>>> throws IOException, InterruptedException { >>>>>>>>>> >>>>>>>>>> String resultFileName = "/result.txt"; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Configuration conf = new Configuration(); >>>>>>>>>> FileSystem fs = >>>>>>>>>> FileSystem.get(URI.create(resultFileName), conf); >>>>>>>>>> FSDataOutputStream out = fs.create(new >>>>>>>>>> Path(resultFileName)); >>>>>>>>>> >>>>>>>>>> InputStream resultIS = new ByteArrayInputStream(new >>>>>>>>>> byte[0]); >>>>>>>>>> >>>>>>>>>> String header = "id,name\n"; >>>>>>>>>> out.write(header.getBytes()); >>>>>>>>>> >>>>>>>>>> String xmlContent = value.toString(); >>>>>>>>>> InputStream is = new >>>>>>>>>> ByteArrayInputStream(xmlContent.getBytes()); >>>>>>>>>> DocumentBuilderFactory factory = >>>>>>>>>> DocumentBuilderFactory.newInstance(); >>>>>>>>>> DocumentBuilder builder; >>>>>>>>>> try { >>>>>>>>>> builder = factory.newDocumentBuilder(); >>>>>>>>>> Document doc = builder.parse(is); >>>>>>>>>> DTMNodeList list = (DTMNodeList) >>>>>>>>>> getNode("/main/data", doc, >>>>>>>>>> XPathConstants.NODESET); >>>>>>>>>> >>>>>>>>>> int size = list.getLength(); >>>>>>>>>> for (int i = 0; i < size; i++) { >>>>>>>>>> Node node = list.item(i); >>>>>>>>>> String line = ""; >>>>>>>>>> NodeList nodeList = node.getChildNodes(); >>>>>>>>>> int childNumber = nodeList.getLength(); >>>>>>>>>> for (int j = 0; j < childNumber; j++) { >>>>>>>>>> line += nodeList.item(j).getTextContent() + >>>>>>>>>> ","; >>>>>>>>>> } >>>>>>>>>> if (line.endsWith(",")) >>>>>>>>>> line = line.substring(0, line.length() - 1); >>>>>>>>>> line += "\n"; >>>>>>>>>> out.write(line.getBytes()); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> } catch (ParserConfigurationException e) { >>>>>>>>>> MyLogguer.log("error: " + e.getMessage()); >>>>>>>>>> e.printStackTrace(); >>>>>>>>>> } catch (SAXException e) { >>>>>>>>>> MyLogguer.log("error: " + e.getMessage()); >>>>>>>>>> e.printStackTrace(); >>>>>>>>>> } catch (XPathExpressionException e) { >>>>>>>>>> MyLogguer.log("error: " + e.getMessage()); >>>>>>>>>> e.printStackTrace(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> IOUtils.copyBytes(resultIS, out, 4096, true); >>>>>>>>>> out.close(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> public static Object getNode(String xpathStr, Node node, >>>>>>>>>> QName retunType) >>>>>>>>>> throws XPathExpressionException { >>>>>>>>>> XPath xpath = xpathFactory.newXPath(); >>>>>>>>>> return xpath.evaluate(xpathStr, node, retunType); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -------------------------------------- >>>>>>>>>> Main class: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> public class Main { >>>>>>>>>> >>>>>>>>>> public static void main(String[] args) throws Exception { >>>>>>>>>> >>>>>>>>>> if (args.length != 2) { >>>>>>>>>> System.err >>>>>>>>>> .println("Usage: XMLtoText <input path> >>>>>>>>>> <output path>"); >>>>>>>>>> System.exit(-1); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Job job = new Job(); >>>>>>>>>> job.setJarByClass(Main.class); >>>>>>>>>> job.setJobName("XML to Text"); >>>>>>>>>> FileInputFormat.addInputPath(job, new Path(args[0])); >>>>>>>>>> FileOutputFormat.setOutputPath(job, new Path(args[1])); >>>>>>>>>> >>>>>>>>>> job.setMapperClass(XmlToTextMapper.class); >>>>>>>>>> job.setNumReduceTasks(0); >>>>>>>>>> job.setMapOutputKeyClass(Text.class); >>>>>>>>>> job.setMapOutputValueClass(Text.class); >>>>>>>>>> System.exit(job.waitForCompletion(true) ? 0 : 1); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> To execute the job you can use : >>>>>>>>>> >>>>>>>>>> bin/hadoop Main /data.xml /output. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Then you can use this to see result.txt file: >>>>>>>>>> >>>>>>>>>> hadoop fs -cat /result.txt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm using this xml as input: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <Comp><Emp><id>1</id><name>NameA</name></data><data><id>2</id><name>NameB</name></Emp></Comp> >>>>>>>>>> >>>>>>>>>> and the content in result.txt is like this: >>>>>>>>>> >>>>>>>>>> id,name >>>>>>>>>> 1,NameA >>>>>>>>>> 2,NameB >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hope this helps. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014/1/3 Ranjini Rathinam <ranjinibe...@gmail.com> >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Need to convert XML into text using mapreduce. >>>>>>>>>>> >>>>>>>>>>> I have used DOM and SAX parser. >>>>>>>>>>> >>>>>>>>>>> After using SAX Builder in mapper class. the child node act as >>>>>>>>>>> root Element. >>>>>>>>>>> >>>>>>>>>>> While seeing in Sys out i found thar root element is taking the >>>>>>>>>>> child element and printing. >>>>>>>>>>> >>>>>>>>>>> For Eg, >>>>>>>>>>> >>>>>>>>>>> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp> >>>>>>>>>>> when this xml is passed in mapper , in sys out printing the root >>>>>>>>>>> element >>>>>>>>>>> >>>>>>>>>>> I am getting the the root element as >>>>>>>>>>> >>>>>>>>>>> <id> >>>>>>>>>>> <name> >>>>>>>>>>> >>>>>>>>>>> Please suggest and help to fix this. >>>>>>>>>>> >>>>>>>>>>> I need to convert the xml into text using mapreduce code. Please >>>>>>>>>>> provide with example. >>>>>>>>>>> >>>>>>>>>>> Required output is >>>>>>>>>>> >>>>>>>>>>> id,name >>>>>>>>>>> 100,RR >>>>>>>>>>> >>>>>>>>>>> Please help. >>>>>>>>>>> >>>>>>>>>>> Thanks in advance, >>>>>>>>>>> Ranjini R >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >
<school><student><id>100</id><name>ranjini</name><addresses><address><addresstype>HOME</addresstype><address1>a street</address1><address2>ad street</address2><address3>ads street</address3><city>chennai</city><state>tn</state></address><address><addresstype>OFFICE</addresstype><address1>adsja1 street</address1><address2>adsja2 street</address2><address3>adsja3 street</address3><city>mumbai</city><state>Maharastra</state></address></addresses></student><student><id>101</id><name>nivetha</name><addresses><address><addresstype>HOME</addresstype><address1>adsja4 street</address1><address2>adsja5 street</address2><address3>adsja6 street</address3><city>navallur</city><state>tn</state></address></addresses></student><student><id>102</id><name>shiva</name><addresses><address></address></addresses></student><student><id>1001</id><name>rankj</name><addresses><address><addresstype>HOME</addresstype><address1>adsjar1 street</address1><address2>adsjar2 street</address2><address3>adsjar3 street</address3><city>medavakkam</city><state>tn</state></address><address> <addresstype>OFFICE</addresstype><address1>adsjas1 street</address1><address2>adsjas2 street</address2><address3>adsjas3 street</address3><city>maduari</city><state>tn</state></address></addresses></student><student><id>1012</id><name>raji</name><addresses><address><addresstype>HOME</addresstype><address1>adsjad1 street</address1><address2>adsjad2 street</address2><address3>adsjad3 street</address3><city>trichy</city><state>tn</state></address></addresses></student><student><id>1023</id><name>priya</name><addresses><address></address></addresses></student></school>