Dear nutchers, I have a larger set of domains and many URLs which I want to process. I only want to crawl pages from those domains, but I am interested in all outlinks regardless wether its inbound or not.
I am using property db.ignore.external.links=true. And I want to create a webgraphdb. Currently, I am getting an empty webgraphdb. In org/apache/nutch/parse/ParseOutputFormat.java non-domain anchors are filtered out already at parse phase and do not make their way in parsedata. I had somehow the hope this happens at a later stage. Any (hackish) way for doing that? Any suggestions are very welcome. Martin

