You can write a simple parse filter plugin. With the NodeWalker you can walk all nodes of the DOM and get the alt attribute for img tags.
NodeWalker walker = new NodeWalker(doc); Node currentNode = walker.nextNode(); if (currentNode.getNodeType() == Node.ELEMENT_NODE) { if ("img".equalsIgnoreCase(currentNode.getNodeName())) { HashMap<String,String> atts = getAttributes(currentNode); } } } protected HashMap<String,String> getAttributes(Node node) { HashMap<String,String> attribMap = new HashMap<String,String>(); NamedNodeMap attributes = node.getAttributes(); for(int i = 0 ; i < attributes.getLength(); i++) { Attr attribute = (Attr)attributes.item(i); attribMap.put(attribute.getName().toLowerCase(), attribute.getValue()); } return attribMap; } -----Original message----- > From:Alexandre <alex.hura...@gmail.com> > Sent: Mon 01-Oct-2012 15:05 > To: user@nutch.apache.org > Subject: Re: Parsing/Indexing alt tag > > Hi Patrick, > > I have the same Problem. > Did you find a way to parse the alt attributes without rewrite a complet > parse plugin? > > Alex. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Parsing-Indexing-alt-tag-tp3999540p4011181.html > Sent from the Nutch - User mailing list archive at Nabble.com. >