It looks like the InjectorJob phase successfully injects your 1 URL in to Cassandra Keyspace.
On Thu, Jun 5, 2014 at 12:14 PM, Manikandan Saravanan < [email protected]> wrote: > > 14/06/05 15:01:02 INFO mapred.JobClient: Map input records=1 > > ... > 14/06/05 15:01:02 INFO mapred.JobClient: Map output records=1 > 14/06/05 15:01:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=110 > 14/06/05 15:01:02 INFO crawl.InjectorJob: InjectorJob: total number of > urls rejected by filters: 0 > 14/06/05 15:01:02 INFO crawl.InjectorJob: InjectorJob: total number of > urls injected after normalization and filtering: 1 > 14/06/05 15:01:02 INFO crawl.InjectorJob: Injector: finished at 2014-06-05 > 15:01:02, elapsed: 00:00:28 > So that looks fine. What I would advise you to do is read the dump after injecting. > Thu Jun 5 15:01:02 EDT 2014 : Iteration 1 of 2 > What does this mean? Did you manually edit this? I have never seen this logging before. > > > 14/06/05 15:02:14 INFO mapred.JobClient: Map input records=0 > > If the URL has already been fetched then a fetchmark will not exist for it to be re-fetched. Can this perhaps be the case. It seems that you have been tinkering with crawl cycles without understanding and/or recognizing the crawl cycle itself. If you are just starting out, I really advise you to use the nutch script with individual commands. Reading the database dump is an essential step in a young crawl cycle. Lewis

