Thanks Markus, I was trying to include any23 dependency to nutch 1.10 by adding <dependency org="org.apache.any23" name="apache-any23-core" rev="1.1" /> to ivy.xml. But when i build this I get below. Is there any other configuration change required to add some third party jar ?
[ivy:resolve] :: problems summary :: [ivy:resolve] :::: WARNINGS [ivy:resolve] module not found: org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315 [ivy:resolve] ==== local: tried [ivy:resolve] /Users/manishverma/.ivy2/local/org.apache.commons/commons-csv/1.0-SNAPSHOT-rev1148315/ivys/ivy.xml [ivy:resolve] -- artifact org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: [ivy:resolve] /Users/manishverma/.ivy2/local/org.apache.commons/commons-csv/1.0-SNAPSHOT-rev1148315/jars/commons-csv.jar [ivy:resolve] ==== maven2: tried [ivy:resolve] http://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.pom [ivy:resolve] -- artifact org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: [ivy:resolve] http://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.jar [ivy:resolve] ==== apache-snapshot: tried [ivy:resolve] https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.pom [ivy:resolve] -- artifact org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: [ivy:resolve] https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.jar [ivy:resolve] ==== sonatype: tried [ivy:resolve] http://oss.sonatype.org/content/repositories/releases/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.pom [ivy:resolve] -- artifact org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: [ivy:resolve] http://oss.sonatype.org/content/repositories/releases/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.jar [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315: not found [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS > On Mar 18, 2016, at 3:16 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Hello! Nutch doesn't have a mechanism to extract microdata from HTML. But > there is a patch for Apache Tika that comes as a content handler, TIKA-980. > You can embed it into another content handler or use Tika's TeeContentHandler > in Nutch' parse-tika plugin. Downside is that you have to transform the > output data structure to a Writable in the plugin, otherwise you cannot store > it as metadata and run on Hadoop. > > https://issues.apache.org/jira/browse/TIKA-980 > > Markus > > > > -----Original message----- >> From:Manish Verma <m_ve...@apple.com> >> Sent: Thursday 17th March 2016 19:18 >> To: user@nutch.apache.org >> Subject: Extract Microdata >> >> Hi, >> >> I need to crawl on Urls and extract micro data and save to solr. Does Nutch >> support extraction of schema org micro data. >> >> Thanks >> >> >>