vishal toshniwal created NUTCH-1609:
---------------------------------------

             Summary:  java.net.MalformedURLException when running nutch crawl 
with apache-nutch-2.1.jar with hadoop 
                 Key: NUTCH-1609
                 URL: https://issues.apache.org/jira/browse/NUTCH-1609
             Project: Nutch
          Issue Type: Bug
         Environment: nutch 2.1
hadoop 1.0.3
            Reporter: vishal toshniwal


I am getting   java.net.MalformedURLException  when running "crawl" for nutch 
2.1 with hadoop. But it is working fine with the local mode

Following is the exception

bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir 
crawled -depth 3 -topN 5
Warning: $HADOOP_HOME is deprecated.


****hdfs://localhost:9000/user/impadmin/crawled
java.lang.RuntimeException: java.io.IOException: java.io.IOException: 
java.net.MalformedURLException
        at 
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.io.IOException: 
java.net.MalformedURLException
        at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
        at 
org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
        at 
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
        ... 9 more
Caused by: java.io.IOException: java.net.MalformedURLException
        at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878)
        at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163)
        at 
org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181)
        at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at 
org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
        at 
org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
        at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
        ... 11 more
Caused by: java.net.MalformedURLException
        at java.net.URL.<init>(URL.java:601)
        at java.net.URL.<init>(URL.java:464)
        at java.net.URL.<init>(URL.java:413)
        at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
Source)
        at 
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
Source)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
        at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
        at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847)
        ... 19 more

Exception in thread "main" java.lang.RuntimeException: job failed: 
name=generate: 1373549310-1607767962, jobid=job_201307111857_0002
        at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
        at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



bin/nutch crawl urls -depth 3 -topN 5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to