vishal toshniwal created NUTCH-1609: ---------------------------------------
Summary: java.net.MalformedURLException when running nutch crawl with apache-nutch-2.1.jar with hadoop Key: NUTCH-1609 URL: https://issues.apache.org/jira/browse/NUTCH-1609 Project: Nutch Issue Type: Bug Environment: nutch 2.1 hadoop 1.0.3 Reporter: vishal toshniwal I am getting java.net.MalformedURLException when running "crawl" for nutch 2.1 with hadoop. But it is working fine with the local mode Following is the exception bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir crawled -depth 3 -topN 5 Warning: $HADOOP_HOME is deprecated. ****hdfs://localhost:9000/user/impadmin/crawled java.lang.RuntimeException: java.io.IOException: java.io.IOException: java.net.MalformedURLException at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.io.IOException: java.net.MalformedURLException at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483) at org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125) at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112) ... 9 more Caused by: java.io.IOException: java.net.MalformedURLException at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878) at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163) at org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181) at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75) at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133) at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480) ... 11 more Caused by: java.net.MalformedURLException at java.net.URL.<init>(URL.java:601) at java.net.URL.<init>(URL.java:464) at java.net.URL.<init>(URL.java:413) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489) at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807) at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847) ... 19 more Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: 1373549310-1607767962, jobid=job_201307111857_0002 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191) at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) at org.apache.nutch.crawl.Crawler.run(Crawler.java:152) at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) bin/nutch crawl urls -depth 3 -topN 5 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira